# Using databases with python- week 5

From [coursera](https://www.coursera.org/learn/python-databases).

July 2022.


In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>

<div id="toc"></div>

# Multi-step data analysis

Now we're going to combine everything we've done so far. 

* Gather info from a data source  
    * Keep this part relatively simple  
    * Don't try to do any cleaning here  
    * This can take a long time, so you may want to do it in batches. Design it to be a restartable process.  
* Store that information in a database  
    * Databases are useful because they are not easily corruptable. So if a new data retrieval fails, it won't corrupt your entire database.  
* Clean/process  
* Visualize  
    * In these examples, using maps, jscript, and D3JS  
* Analyze  

This isn't exactly data mining, it's less sophisticated than that. More sophisticated data mining technologies exist such as hadoop, spark, etc. Often, python is part of those. 

<img src="img/week5_02.jpg?modified=1" />

# GeoData Example

Use the Google Maps API to pull some data. Here's a diagram of the entire workflow. 

<img src="img/week5_01.jpg?modified=1" />

## `geload.py`: Query data and store to `geodata.sqlite`
First use `geoload.py` to pull data and store it to a local cache in a database called `geodata.sqlite`.

In this case we are pulling from a text file that has a list of locations. `geoload.py` will pass that location name to a Google Maps API and then store the info to the database `geodata.sqlite`.

I ran it in VS code. The output looks like this:

```
Retrieving http://py4e-data.dr-chuck.net/json?address=Universidade+Federal+do+Rio+Grande+do+Sul&key=42
Retrieved 1910 characters {    "results" : [  

Retrieving http://py4e-data.dr-chuck.net/json?address=Universidade+Federal+do+Rio+de+Janeiro&key=42
Retrieved 2410 characters {    "results" : [  

Retrieving http://py4e-data.dr-chuck.net/json?address=Universidade+Tecnica+de+Lisboa&key=42
Retrieved 1589 characters {    "results" : [  

Retrieving http://py4e-data.dr-chuck.net/json?address=Universidade+de+Sao+Paulo&key=42
Retrieved 1710 characters {    "results" : [  

Retrieving http://py4e-data.dr-chuck.net/json?address=Universidade+do+Minho&key=42
Retrieved 1798 characters {    "results" : [  
Pausing for a bit...

Retrieving http://py4e-data.dr-chuck.net/json?address=Universitas+Gadjah+Mada&key=42
Retrieved 2332 characters {    "results" : [  
Retrieved 200 locations, restart to retrieve more
Run geodump.py to read the data from the database so you can vizualize it on a map.
```

Let's look at the database we just created and see what is there. I'll open it in SQLiteStudio. 

<img src="img/week5_03.jpg?modified=1" />

There is one table called `Locations` with two columns:  
* `address`: contains the info from the `where.data` file that we passed to `geoload.py`  
* `geodata`: looks like a dictionary containing the returned information. Here is what the first entry looks like (when `address = AGH University of Science and Technology`)  

```
{
   "results" : [
      {
         "address_components" : [
            {
               "long_name" : "30",
               "short_name" : "30",
               "types" : [ "street_number" ]
            },
            {
               "long_name" : "aleja Adama Mickiewicza",
               "short_name" : "aleja Adama Mickiewicza",
               "types" : [ "route" ]
            },
            {
               "long_name" : "Krowodrza",
               "short_name" : "Krowodrza",
               "types" : [ "political", "sublocality", "sublocality_level_1" ]
            },
            {
               "long_name" : "Kraków",
               "short_name" : "Kraków",
               "types" : [ "locality", "political" ]
            },
            {
               "long_name" : "Kraków",
               "short_name" : "Kraków",
               "types" : [ "administrative_area_level_2", "political" ]
            },
            {
               "long_name" : "Małopolskie",
               "short_name" : "Małopolskie",
               "types" : [ "administrative_area_level_1", "political" ]
            },
            {
               "long_name" : "Poland",
               "short_name" : "PL",
               "types" : [ "country", "political" ]
            },
            {
               "long_name" : "30-059",
               "short_name" : "30-059",
               "types" : [ "postal_code" ]
            }
         ],
         "formatted_address" : "aleja Adama Mickiewicza 30, 30-059 Kraków, Poland",
         "geometry" : {
            "location" : {
               "lat" : 50.06688579999999,
               "lng" : 19.9136192
            },
            "location_type" : "ROOFTOP",
            "viewport" : {
               "northeast" : {
                  "lat" : 50.0699639,
                  "lng" : 19.9239857
               },
               "southwest" : {
                  "lat" : 50.0643824,
                  "lng" : 19.8998463
               }
            }
         },
         "partial_match" : true,
         "place_id" : "ChIJIZu1VqdbFkcR0RezIbqNDLI",
         "plus_code" : {
            "compound_code" : "3W87+QC Kraków, Poland",
            "global_code" : "9F2X3W87+QC"
         },
         "types" : [ "establishment", "point_of_interest", "university" ]
      }
   ],
   "status" : "OK"
}
```

## Use `geodump.py` to parse `geodata.sqlite`; store it to `where.js`

We need to convert this into a format that can be read and displayed by the frontend website. We will use `geodump.py` to read info from the database and store it in `where.js`. 

The example `where.js` file that was in the original downloaded file looks like this:

```
myData = [
[42.340075,-71.0895367, 'Northeastern, Boston, MA 02115, USA'],
[32.778949,35.019648, 'Technion/ Sports Building, Haifa'],
[33.1561058,131.826132, 'Japan, 〒875-0002 Ōita-ken, Usuki-shi, Shitanoe, 1232−2 ＵＭＤ'],
[42.4036847,-71.120482, 'South Hall Tufts University, 30 Lower Campus Rd, Somerville, MA 02144, USA'],
[-37.914517,145.1303881, 'Monash College, Wellington Rd, Clayton VIC 3168, Australia'],
[53.2948229,69.4047872, 'Kokshetau 020000, Kazakhstan'],
[40.7127837,-74.0059413, 'New York, NY, USA']...]
```

Now run `geodump.py`. It prints a lot of output, which is the formatted address / latitude / longitude for each entry. It looks like this: 

```
HCPW+WMC, 11 Avenida, Cdad. de Guatemala 01012, Guatemala 14.5873005 -90.55336129999999
C/Plaza de Santa Cruz, 8, 47002 Valladolid, Spain 41.6569271 -4.7140547
Chía, Cundinamarca, Colombia 4.855814899999999 -74.0417628
18 Avenida 11-95 Guatemala, Cdad. de Guatemala 01015, Guatemala 14.603762 -90.48924799999999
Campus I Lot. Cidade Universitaria - Castelo Branco, João Pessoa - PB, 58051-900, Brazil -7.137748500000001 -34.8458974
R. Eng. Agronômico Andrei Cristian Ferreira, s/n - Trindade, Florianópolis - SC, 88040-900, Brazil -27.5999666 -48.5194152
Farroupilha, Porto Alegre - RS, 90040-040, Brazil -30.0339726 -51.2190483
Av. Pedro Calmon, 550 - Cidade Universitária da Universidade Federal do Rio de Janeiro, Rio de Janeiro - RJ, 21941-901, Brazil -22.8625345 -43.2234737
```

Take a look at `where.js` again and see what it looks like. It's the same format as before but with different content. The previous entries have been replaced:

```
myData = [
[50.06688579999999,19.9136192, 'aleja Adama Mickiewicza 30, 30-059 Kraków, Poland'],
[52.2394019,21.0150792, 'Krakowskie Przedmieście 5, 00-068 Warszawa, Poland'],
[30.0185741,31.5013996, 'Plot 15 Admin building (South tower) 90 Axis, Beside FUE, in frond of AUC قسم أول القاهرة الجديدة، محافظة القاهرة‬ 4728120،، New Cairo 1, Cairo Governorate 4728120, Egypt'],
[33.4242399,-111.9280527, 'Tempe, AZ 85281, USA'],
[38.0399391,23.8030901, 'Monumental Plaza, Building C, 1st Floor, Leof. Kifisias 44, Marousi 151 25, Greece'],
[28.3588163,75.58802039999999, 'Vidya Vihar, Pilani, Rajasthan 333031, India']...]
```

## Visualize data with `where.html`

I can look into the raw html for this file and see a bit of what it's doing. I see that it is loading `where.js`, loading a style sheet, and using a function `initialize` to create each map marker. 

Here's what it looks like when I open it in a web browser:  

<img src="img/week5_04.jpg?modified=1" />

I can hover over each marker and see a label for the marker. I was able to find a few of the entries from the list above to confirm that the right data is being displayed on the map. 

# Homework: Add my own location

I'm going to add the location of one of my favorite hikes: Cooper Mountain Nature Park. 

## `where2.data`, `geoload.py`, `geoload2.sqlite`

Start by adding it to `where.data` and run `geoload.py`. I'm going to rename it to `where2.data` and shorten the file to speed things up. The end of `where2.data` looks like this: 

```
...
federal institute of tecnology and education from southeastern Minas Gerais
kansas state university
universidad complutense de madrid
university of Patras
university of padua
Cooper Mountain Nature Park
```

Here's what the output of `geoload.py` looks like:


<img src="img/week5_05.jpg?modified=2" />

Let's take a look at the database and make sure I see this entry. 

<img src="img/week5_06.jpg?modified=2" />

Looks good. 

# `geodump.py` and `where2.js`

Now run `geodump.py` to generate a new file with our entry: `where2.js`. Here's what the output looks like:


<img src="img/week5_07.jpg?modified=1" />




# `where2.html`

Now load the data in a web browser and zoom in the location. Do we see it? There it is!

<img src="img/week5_08.jpg?modified=1" />

Now I can upload these images for the assignment. 