## Problem 3: Reading coordinates from a file and creating geometries (*5 points*) 

One of the most typical problems in GIS is the situation where you have a set of coordinates in some file, and you need to map those. Python is a really handy tool for these kind of situations, as it is possible to read data from (basically) any kind of input datafile (such as csv-, txt-, excel-, gpx-files (gps data), databases etc.). 

Thus, let's see how we can read data from a file and create Point -objects from them that can be saved e.g. as a new Shapefile (we will learn this next week). Our dataset **[travelTimes_2015_Helsinki.txt](data/travelTimes_2015_Helsinki.txt)** consist of 
travel times between specific locations in Helsinki Region. The first four rows of our data looks like this:

```
   from_id;to_id;fromid_toid;route_number;at;from_x;from_y;to_x;to_y;total_route_time;route_time;route_distance
   5861326;5785640;5861326_5785640;1;08:10;24.9704379;60.3119173;24.8560344;60.399940599999994;125.0;99.0;22917.6
   5861326;5785641;5861326_5785641;1;08:10;24.9704379;60.3119173;24.8605682;60.4000135;123.0;102.0;23123.5
   5861326;5785642;5861326_5785642;1;08:10;24.9704379;60.3119173;24.865102;60.4000863;125.0;103.0;23241.3
```

As we can see, there exists many columns in the data, but the few important ones needed here are:

| Column | Description |
|--------|-------------|
| from_x | x-coordinate of the **origin** location (longitude) |
| from_y | y-coordinate of the **origin** location (latitude) |
| to_x   | x-coordinate of the **destination** location (longitude)|
| to_y   | y-coordinate of the **destination** location (latitude) |
| total_route_time | Travel time with public transportation at the route |

Read more about the input data set at the Digital Geography Lab / Accessibility Research Group (University of Helsinki, Finland) website: https://blogs.helsinki.fi/accessibility/helsinki-region-travel-time-matrix/.

### Steps

1: Read the [data/travelTimes_2015_Helsinki.txt](data/travelTimes_2015_Helsinki.txt) file into a variable **`data`** using  pandas.

**NOTE:** What is the separator in the data (see above)? Remember to take that into account when reading the data.


In [27]:
import pandas as pd
data = pd.read_csv(r'C:\Users\Maria\Desktop\ISB\GIS Course\Exercise-1-master\data\travelTimes_2015_Helsinki.csv', sep=';'  , engine='python') 


In [28]:
#Check how many rows
data.shape

(14643, 13)

In [29]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

# This test print should print first five rows in the data (if not, something is incorrect):
print(data.head())

   from_id    to_id      fromid_toid  route_number     at     from_x  \
0  5861326  5785640  5861326_5785640             1  08:10  24.970438   
1  5861326  5785641  5861326_5785641             1  08:10  24.970438   
2  5861326  5785642  5861326_5785642             1  08:10  24.970438   
3  5861326  5785643  5861326_5785643             1  08:10  24.970438   
4  5861326  5787544  5861326_5787544             1  08:10  24.970438   

      from_y       to_x       to_y  total_route_time  route_time  \
0  60.311917  24.856034  60.399941             125.0        99.0   
1  60.311917  24.860568  60.400014             123.0       102.0   
2  60.311917  24.865102  60.400086             125.0       103.0   
3  60.311917  24.869636  60.400159             129.0       107.0   
4  60.311917  24.842582  60.397478             118.0        92.0   

   route_distance  route_total_lines  
0         22917.6                2.0  
1         23123.5                2.0  
2         23241.3                2.0  
3 

2: Select the 4 columns that contain coordinate information (**'from_x'**, **'from_y'**, **'to_x'**, **'to_y'**) and store them in variable **`data`** (i.e. update the data -variable).


In [31]:
data = data[['from_x', 'from_y', 'to_x', 'to_y']]
data

Unnamed: 0,from_x,from_y,to_x,to_y
0,24.970438,60.311917,24.856034,60.399941
1,24.970438,60.311917,24.860568,60.400014
2,24.970438,60.311917,24.865102,60.400086
3,24.970438,60.311917,24.869636,60.400159
4,24.970438,60.311917,24.842582,60.397478
...,...,...,...,...
14638,24.970438,60.311917,24.559702,60.174754
14639,24.970438,60.311917,24.564204,60.174837
14640,24.970438,60.311917,24.555367,60.172428
14641,24.970438,60.311917,24.559868,60.172511


In [32]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
print(list(data.columns))

['from_x', 'from_y', 'to_x', 'to_y']


3: Create (two) empty lists for points called **`orig_points`** and **`dest_points`**


In [45]:
orig_points = []
dest_points = []

In [46]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

# List length should be zero at this point:
print('orig_points length:', len(orig_points))
print('dest_points length:', len(dest_points))

orig_points length: 0
dest_points length: 0


4: Create shapely points for each origin and destination and add origin points to `orig_points` list and destination points to `dest_points` list.

**HOW?:**

- Create a for-loop and iterate over the rows of your dataframe
- For each row, create Shapely Point -objects based on the coordinate columns (columns `from_x` and `from_y` for the origins and columns `to_x` and `to_y` for the destinations)
- Append the point objects into the **`orig_points`** -list and **`dest_point`** -list.

See Geo-Python Lesson 6 materials [for iterating data frame fows](https://geo-python.github.io/site/notebooks/L6/pandas/advanced-data-processing-with-pandas.html#Iterating-rows-and-using-self-made-functions-in-Pandas) if you do not remember how to do it.

In [47]:
from shapely.geometry import Point,LineString,Polygon

for index, row in data.iterrows():
    orig_points.append(Point(row['from_x'], row['from_y']))
    dest_points.append(Point(row['to_x'], row['to_y']))
    



**NOTE: After you have solved this problem, we recommend that you restart the kernel and run all cells again! Otherwise you might append the same coordinates to the lists many times if you run the cell multiple times.**

In [48]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

# This test print should print out the first origin and destination coordinates in the two lists:
print("ORIGIN X Y:", orig_points[0].x, orig_points[0].y)
print("DESTINATION X Y:", dest_points[0].x, dest_points[0].y)

#Check that you created a correct amount of points:
assert len(orig_points) == len(data), "Number of origin points must be the same as number of rows in the original file"
assert len(dest_points) == len(data), "Number of destination points must be the same as number of rows in the original file"

ORIGIN X Y: 24.9704379 60.3119173
DESTINATION X Y: 24.8560344 60.3999406



- Upload the codes and edits to your **own** personal GitHub repository for Exercise-1 in AutoGIS-2018.

## Done!

That's it. Now you are ready to continue for the final Problem 4.

## Problem 4: Creating LineStrings that represent the movements (*5 points*):

This task continuous where we left in Problem 3. 
   
1: Create a list called `lines`


In [50]:
lines = []

In [51]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

# Lines length should be zero at this stage:
print('lines length:', len(lines))

lines length: 0


2a: Iterate over the origin and destination lists and create a Shapely LineString -object between the origin and destination point

  - Hint - Alternative 1: You can take advantage of `range()` function here that can help accessing the values from two lists at the same time.
     
  - Hint - Alternative 2: You can use `zip()` function to iterate over many lists at the same time. [See hints for this week](https://automating-gis-processes.github.io/2018/lessons/L1/ex-1.html#hints)
  
2b: Add the LineString into the `lines` -list.


In [53]:
for orig, dest in zip(orig_points, dest_points):
    line = LineString([orig,dest])
    lines.append(line)
    


**NOTE: After you have solved this problem, we recommend that you restart the kernel and run all cells again! Otherwise you might append the same points to the lists many times if you run the cell multiple times.**

In [56]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

#Test that the list has correct number of LineStrings
assert len(lines) == len(data), "There should be as many lines as there are rows in the original data"

3: Create a variable called **`total_length`**, and store the total (Euclidian) distance of all the origin-destination LineStrings that we just created into that variable.

  - Hint: You might want to iterate over the lines and update the total lenght on each iteration.


In [65]:
total_length = 0
for l in lines:
    total_length = total_length + l.length


In [66]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

# This test print should print the total length of all lines
print("Total length of all lines is", round(total_length, 2))

Total length of all lines is 3148.57


4: write the previous parts, i.e. the creation of the LineString and calculating the total distance, into dedicated functions:  

- `create_od_lines()`: Takes two lists of Shapely Point -objects as input and returns a list of LineStrings
- `calculate_total_distance()`: Takes a list of LineString geometries as input and returs their total length

You can copy and paste the codes you have written earlier into the functions. Below, you can find a code cell for testing your functions (you should get the same result as earler).

In [68]:
def create_od_lines(orig_points, dest_points):
    lines = []
    for orig, dest in zip(orig_points, dest_points):
        line = LineString([orig,dest])
        lines.append(line)
    return lines

def calculate_total_distance(lines):
    total_length = 0
    for l in lines:
        total_length = total_length + l.length
    return total_length



In [69]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION

# Use the functions
# -----------------

# Create origin-destination lines
od_lines = create_od_lines(orig_points, dest_points)

# Calculate the total distance
tot_dist = calculate_total_distance(od_lines)

print("Total distance", round(tot_dist,2))


Total distance 3148.57



## All done!

Awesome, now you have successfully practiced how geometries can be created in Python. Next week we will start using them actively.