## Problem 3: Reading coordinates from a file and creating a geometries (*5 points*) 

One of the most typical problems in GIS is the situation where you have a set of coordinates in some file, and you need to map those. Python is a really handy tool for these kind of situations, as it is possible to read data from (basically) any kind of input datafile (such as csv-, txt-, excel-, gpx-files (gps data), databases etc.). 

Thus, let's see how we can read data from a file and create Point -objects from them that can be saved e.g. as a new Shapefile (we will learn this next week). Our dataset **[travelTimes_2015_Helsinki.txt](data/travelTimes_2015_Helsinki.txt)** consist of 
travel times between specific locations in Helsinki Region. The first four rows of our data looks like this:

```
   from_id;to_id;fromid_toid;route_number;at;from_x;from_y;to_x;to_y;total_route_time;route_time;route_distance
   5861326;5785640;5861326_5785640;1;08:10;24.9704379;60.3119173;24.8560344;60.399940599999994;125.0;99.0;22917.6
   5861326;5785641;5861326_5785641;1;08:10;24.9704379;60.3119173;24.8605682;60.4000135;123.0;102.0;23123.5
   5861326;5785642;5861326_5785642;1;08:10;24.9704379;60.3119173;24.865102;60.4000863;125.0;103.0;23241.3
```

As we can see, there exists many columns in the data, but the few important ones needed here are:

| Column | Description |
|--------|-------------|
| from_x | x-coordinate of the **origin** location (longitude) |
| from_y | y-coordinate of the **origin** location (latitude) |
| to_x   | x-coordinate of the **destination** location (longitude)|
| to_y   | y-coordinate of the **destination** location (latitude) |
| total_route_time | Travel time with public transportation at the route |

### Steps

1: Read the [data/travelTimes_2015_Helsinki.txt](data/travelTimes_2015_Helsinki.txt) file into a variable **`data`** using  Pandas.

  - What is the separator in the data (see above)? Remember to take that into account when reading the data.


In [1]:
# REPLACE THE ERROR BELOW WITH YOUR OWN CODE
from shapely.geometry import Point, LineString, Polygon
import pandas as pd

data = pd.read_csv('travelTimes_2015_Helsinki.txt',sep = ';')

# This test print should print first five rows in the data (if not, something is incorrect):
data.head()


In [2]:
# This test print should print first five rows in the data (if not, something is incorrect):
data

Unnamed: 0,from_id,to_id,fromid_toid,route_number,at,from_x,from_y,to_x,to_y,total_route_time,route_time,route_distance,route_total_lines
0,5861326,5785640,5861326_5785640,1,08:10,24.970438,60.311917,24.856034,60.399941,125.0,99.0,22917.6,2.0
1,5861326,5785641,5861326_5785641,1,08:10,24.970438,60.311917,24.860568,60.400014,123.0,102.0,23123.5,2.0
2,5861326,5785642,5861326_5785642,1,08:10,24.970438,60.311917,24.865102,60.400086,125.0,103.0,23241.3,2.0
3,5861326,5785643,5861326_5785643,1,08:10,24.970438,60.311917,24.869636,60.400159,129.0,107.0,23534.2,2.0
4,5861326,5787544,5861326_5787544,1,08:10,24.970438,60.311917,24.842582,60.397478,118.0,92.0,22428.2,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
14638,5861326,5967091,5861326_5967091,1,08:06,24.970438,60.311917,24.559702,60.174754,94.0,72.0,40702.8,2.0
14639,5861326,5967092,5861326_5967092,1,08:06,24.970438,60.311917,24.564204,60.174837,97.0,75.0,40915.0,2.0
14640,5861326,5968733,5861326_5968733,1,08:06,24.970438,60.311917,24.555367,60.172428,89.0,66.0,40305.9,2.0
14641,5861326,5968734,5861326_5968734,1,08:06,24.970438,60.311917,24.559868,60.172511,93.0,71.0,40628.0,2.0


2: Select 4 columns from the DataFrame: i.e. **'from_x'**, **'from_y'**, **'to_x'**, **'to_y'** and store them in variable **`data`** (i.e. update the data -variable).


In [3]:
# REPLACE THE ERROR BELOW WITH YOUR OWN CODE
data = data[['from_x','from_y','to_x','to_y']]


In [4]:
# Print out the columns of the 'data' DataFrame:


In [5]:
data

Unnamed: 0,from_x,from_y,to_x,to_y
0,24.970438,60.311917,24.856034,60.399941
1,24.970438,60.311917,24.860568,60.400014
2,24.970438,60.311917,24.865102,60.400086
3,24.970438,60.311917,24.869636,60.400159
4,24.970438,60.311917,24.842582,60.397478
...,...,...,...,...
14638,24.970438,60.311917,24.559702,60.174754
14639,24.970438,60.311917,24.564204,60.174837
14640,24.970438,60.311917,24.555367,60.172428
14641,24.970438,60.311917,24.559868,60.172511


3: Create (2) lists for points called **`orig_points`** and **`dest_points`**


In [6]:
# REPLACE THE ERROR BELOW WITH YOUR OWN CODE
orig_points = []
dest_points = []




In [7]:
# These test prints should produce empty lists
print('orig_points length:', len(orig_points))
print('dest_points length:', len(dest_points))


orig_points length: 0
dest_points length: 0


4: Iterate over the rows of your DataFrame and add Shapely Point -objects into the **`orig_points`** -list and **`dest_point`** -list representing the origin locations (columns `from_x` and `from_y`) and destination locations (columns `to_x` and `to_y`) accordingly.

  - See lesson materials from [Geo-Python Lesson 6](https://geo-python.github.io/2018/notebooks/L6/pandas/advanced-data-processing-with-pandas.html#Iterating-rows-and-using-self-made-functions-in-Pandas) if you do not remember how to iterate over DataFrame rows


In [8]:
# REPLACE THE ERROR BELOW WITH YOUR OWN CODE
for i in range(len(data)):
    orig_temp = (data.from_x[i],data.from_y[i])
    dest_temp = (data.to_x[i],data.to_y[i])
    orig_points.append(orig_temp)
    dest_points.append(dest_temp)
    


In [9]:
# This test print should print first five origin Point -objects
print(orig_points[0:5])


[(24.9704379, 60.3119173), (24.9704379, 60.3119173), (24.9704379, 60.3119173), (24.9704379, 60.3119173), (24.9704379, 60.3119173)]



- Upload the codes and edits to your **own** personal GitHub repository for Exercise-1 in AutoGIS-2018.

## Done!

That's it. Now you are ready to continue for the final Problem 4.

## Problem 4: Creating LineStrings that represent the movements (*5 points*):

This task continuous where we left in Problem 3. 
   
1: Create a list called `lines`


In [10]:
# REPLACE THE ERROR BELOW WITH YOUR OWN CODE
lines = []

In [11]:
# This test print should produce empty list
print('lines length:', len(lines))

lines length: 0


2a: Iterate over the origin and destination lists and create a Shapely LineString -object between the origin and destination point

  - Hint - Alternative 1: You can take advantage of `range()` function here that can help accessing the values from two lists at the same time.
     
  - Hint - Alternative 2: You can use `zip()` function to iterate over many lists at the same time. [See hints for this week](https://automating-gis-processes.github.io/2018/lessons/L1/exercise-1.html#hints)
  
2b: Add the LineString into the `lines` -list.


In [25]:
# REPLACE THE ERROR BELOW WITH YOUR OWN CODE
for i in range(len(data)):
    
    po = Point(orig_points[i])
    pd = Point(dest_points[i])
    lineTemp = LineString([po,pd])
    lines.append(lineTemp)


lines length: 43929


In [28]:
# This test print should print first five LineString -objects
print(lines[0:5])


[<shapely.geometry.linestring.LineString object at 0x7f0e17f5e7f0>, <shapely.geometry.linestring.LineString object at 0x7f0e18218d68>, <shapely.geometry.linestring.LineString object at 0x7f0e17f5eb70>, <shapely.geometry.linestring.LineString object at 0x7f0e17f5ecf8>, <shapely.geometry.linestring.LineString object at 0x7f0e17f5e828>]


--------------------------

3: Create a variable called **`total_length`**, and store the total (Euclidian) distance of all the origin-destination LineStrings that we just created into that variable.

  - Hint: You might want to iterate over the lines and update the total lenght on each iteration.


In [36]:
# REPLACE THE ERROR BELOW WITH YOUR OWN CODE
total_length = 0

for i in range(len(lines)):
    thisLength = lines[i].length
    total_length = total_length + thisLength
    

In [37]:
# This test print should print the total length of all lines
print("Total length of all lines is", total_length)


Total length of all lines is 9445.712342595383


4: To make things more reusable: write the previous parts, i.e. the creation of the LineString and calculating the total distance, into dedicated functions and use them. You can copy and paste the codes you have written earlier. 

  - You can name the functions as you wish
  - Hint: Your function should take origin and destination point lists as input

In [None]:
# REPLACE THE ERROR BELOW WITH YOUR OWN CODE


## All done!

Awesome, now you have successfully practiced how geometries can be created in Python. Next week we will start using them actively.