# Introduction to Python and Kepler
### LODES Commuting Summer Project 2021
## Getting Started
Before anything else, you need to set Python up on your computer. The version of Python we will use is Anaconda Python - see https://www.anaconda.com/download/. Using instructions on this page, install Python (it’s free). You will want the Python 3.7 version (not 2.7). If you are offered a choice between 64-bit and 32-bit versions, unless your laptop is very old you’ll want 64-bit. The download is large (~614Mb) so it may take a while. It may also ask you if you want to install something called VS Code - you won’t need this, so I would suggest saying ‘no’.

There are a number of ways to actually run Python. Here, we will use Navigator to launch the Jupyter app. This is done by first stating Navigator, and from there launching Jupyter. To start up Navigator, follow the instructions here: https://docs.anaconda.com/anaconda/navigator/getting-started/#navigator-starting-navigator. When it has started, you will see a window similar to this:

<img src="image.png">

If the ```Jupyter Notebook``` window has a button that says __Install__ click on this - it installs some software on to your machine. Follow any instructions you get. When installed the button will say __Launch__ instead of Install. When it says Launch then click on it, to start up the ```Jupyter Notebook``` app. This actually runs Python through a web page. 

It should look something like this:

<img src="download.png">

You can also open Jupyter Notebook through the Command Prompt (Windows) or Terminal (MacOS). To open the Terminal on Mac computers, click on the Spotlight icon (magnifying glass) on the top right corner of the desktop and type “Terminal”. Double-click on the “Terminal icon” and it will open.

<img src="Screenshot2.png">

In general, Python is a somewhat similar programming language to R, but is slightly more intuitive, visually attractive, and powerful (although it does not have the full range of specific statistical packages that R does). It is generally the programming language of choice for data scientists and others doing machine learning and/or working with very large datasets. The code itself runs in your computer’s command line, but the newly-developed ```Jupyter Notebooks``` run in your web browser and provide an aesthetically-pleasing and concise method for running markdown (text instructions), code, code outputs, and plots all in one location.

Similar to R, Python is made up of a number of open source packages developed by users. Anaconda includes a number of very useful packages, including ```matplotlib```, ```pandas```, ```numpy```, ```geopandas```, and ```KeplerGL```. To install these packages in the command prompt, type ```conda install [PACKAGENAME]``` and hit “enter” for each package.

Once these five packages have been installed properly, type “jupyter notebook” in the command prompt or terminal and hit enter (or go back to your open Jupyter Notebook directory).

*Note: You can open multiple Command Prompt or Terminal windows at one time. This might be necessary, because the command prompt window that you use to open jupyter needs to stay open for the duration of the time that you are using jupyter (and cannot be used for additional commands, e.g., ```conda install```). You can minimize the command prompt window, but if you close it, the kernel will be lost and you will have to re-open the command prompt and re-start jupyter.*

## Using Python

Now, navigate to the folder in which you saved the data for Lab 2. Click on the file “Python Kepler + QGIS.ipynb” and it will open in a new browser tab (to open a new blank notebook, click on the “New” dropdown menu and select “Python 3” under “Notebook”).

A notebook is similar to an R script in that it contains all of the code for calculating the access scores in the lab. It also has several advantages, however: 1) since it runs in the web browser, it is very easy to place online for replication, dissemination, and viewing, 2) the markdown text can be formatted and is more visually attractive (like a webpage) for easier comprehension, and 3) code is run in chunks in individual cells (see below), with the output for that cell displayed immediately below it; this allows the user to follow the “story” of the code and understand each piece much more easily.

*Note: Both code and markdown text can be edited in the notebook; if you make any changes be sure to __Save__. You can also run the code in a given cell by clicking inside of it and pressing the __Run__ button (or hitting Shift + Enter). The code in the selected cell will run and the cursor will jump to the next cell, so you can easily run through your code by pressing the __Run__ button multiple times.*

### Load Packages and Data

Before running any code, we have to initialize the required packages in this notebook session's virtual memory. To do that, we use the ```import``` command.

In [13]:
import pandas as pd
import geopandas as gpd
from keplergl import KeplerGl

The CSV file with the tract-to-tract flows that we created in Lab 2 is called "Mode_Flows.csv". The next bit of code loads that csv as a dataframe and displays the first 5 rows. Notice that the first row in a Python dataframe (or series) is "0", not "1".

In [14]:
#Read CSV
df = pd.read_csv('Mode_Flows.csv', index_col=0)
# N_BIKE2 column has to be sorted to be recognized as "float" by Kepler
df.sort_values('N_BIKE2', ascending=False, inplace=True)
df.head()

Unnamed: 0,TRIP_ID,origTR,destTR,h_lat,h_long,w_lat,w_long,JTOT,JTOTAUT,JTOTTRN,...,JTOTAUT_R,N_TRAN2_R,N_BIKE2_R,N_WALK2_R,JTOTWFH_R,ADD_TRAN,ADD_BIKE,ADD_WALK,DIST,DIST_EUC
129940,"17031330100, 17031839100",17031330100,17031839100,41.85926,-87.61729,41.88103,-87.63275,2213,1070.308,559.8077,...,0.119505,0.302385,1.029552,,0.735144,91.03076,53.92086,0.0,3.8,2.787876
288779,"17031809300, 17031808702",17031809300,17031808702,42.05795,-87.68647,42.05478,-87.6756,203,63.37374,32.38819,...,0.217555,0.25385,1.217923,0.257086,1.36659,6.26104,52.48731,165.3901,1.383333,1.014005
137446,"17031410500, 17031836200",17031410500,17031836200,41.79783,-87.60375,41.79047,-87.60128,178,66.14286,49.14286,...,0.247851,0.325096,0.86415,0.542782,1.717436,11.21672,40.50756,144.5772,1.05,0.893249
138446,"17031410900, 17031836200",17031410900,17031836200,41.7972,-87.58236,41.79047,-87.60128,299,110.6282,88.28811,...,0.171605,0.317728,1.027358,0.473313,0.85122,20.49406,42.32249,188.4114,2.3,1.788373
138723,"17031411000, 17031836200",17031411000,17031836200,41.79051,-87.58314,41.79047,-87.60128,278,89.71514,99.40505,...,0.304019,0.22138,1.341913,0.484079,1.182419,12.19467,37.48069,142.2438,1.8,1.554379


### Initializing Kepler in Python
The Python version of Kepler is much more stable than the web browser. In order to use it, we first create a blank map with a given size named "kepler_map".

In [15]:
#Load a blank map
kepler_map = KeplerGl(height=800)

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


Now we add the Mode_Flows dataframe (which we named "df") data to the map and initilize a Graphical User Interface (GUI) directly in the notebook.

In [None]:
#Add data - you can add multiple files to the map in this way
kepler_map.add_data(data=df, name="Mode Flows")
# testract = gpd.read_file('Cook_Tracts_TransitWGS.geojson')
# kepler_map.add_data(data=testract, name="Tracts")

In [None]:
#Initialize map
kepler_map

Clicking on the small arrow on the left allows you to manipulate the layers that you loaded. Now we can display the flows with origins and destinations in different colors. We can also export the map (with loaded data) to a separate html file to manipulate diretcly in a web browser.

In [None]:
# Export to html object that opens in your web browser.
kepler_map.save_to_html()