## Introduction to Databases

### Using [DoltHub](https://www.dolthub.com/)

[Tutorial](https://www.dolthub.com/docs/tutorials/dolthub/)  
[Python Reference](https://www.dolthub.com/docs/reference/python/)

In [6]:
!pip install -U doltcli doltpy



In [2]:
from doltpy.core import Dolt
from doltpy.core.write import bulk_import

ModuleNotFoundError: No module named 'doltpy.core'

In [None]:
import requests
import json
import time
import pandas as pd
from doltpy.core import Dolt
from doltpy.core.read import read_table, pandas_read_sql

### [Installing Dolt Locally](https://www.dolthub.com/docs/tutorials/installation/)

    sudo bash -c 'curl -L https://github.com/dolthub/dolt/releases/latest/download/install.sh | sudo bash'
    
### [Installing DoltPy](https://www.dolthub.com/docs/tutorials/installation/#doltpy)

    pip install doltpy
    
### [Signining Up](https://www.dolthub.com/signin)

#### [Data Acquisition](https://www.dolthub.com/docs/tutorials/dolthub/#data-acquisition)

There are several ways to explore data on DoltHub

+ Navigate to the database [home page](https://www.dolthub.com/repositories/dolthub/corona-virus) and use the SQL Console  
+ Use the [Dolt CLI](https://www.dolthub.com/docs/tutorials/dolthub/#data-acquisition_dolt-cli)  
+ Using Dolthub via [API](https://www.dolthub.com/docs/tutorials/dolthub/#api-alpha) with requests

In [2]:
api_root = 'https://dolthub.com/api/v1alpha1/{}/{}'

Using [corona-virus](https://www.dolthub.com/repositories/dolthub/corona-virus) table

In [3]:
owner = 'dolthub'
repo = 'corona-virus'
branch = 'master'

In [4]:
res = requests.get(api_root.format(owner, repo))

In [5]:
for key in res.json().keys():
    print(key)

query_execution_status
query_execution_message
repository_owner
repository_name
commit_ref
sql_query
schema
rows


In [6]:
res.json()

{'query_execution_status': 'Success',
 'query_execution_message': '',
 'repository_owner': 'dolthub',
 'repository_name': 'corona-virus',
 'commit_ref': 'master',
 'sql_query': 'SHOW TABLES;',
 'schema': [{'columnName': 'Table',
   'columnType': 'String',
   'isPrimaryKey': False}],
 'rows': [{'Table': 'case_details'},
  {'Table': 'cases'},
  {'Table': 'cases_by_age_range'},
  {'Table': 'cases_by_age_sex'},
  {'Table': 'cases_by_sex'},
  {'Table': 'characteristics_age'},
  {'Table': 'characteristics_case_severity'},
  {'Table': 'characteristics_comorbid_condition'},
  {'Table': 'characteristics_occupation'},
  {'Table': 'characteristics_onset_date_range'},
  {'Table': 'characteristics_province'},
  {'Table': 'characteristics_sex'},
  {'Table': 'characteristics_wuhan_exposed'},
  {'Table': 'current'},
  {'Table': 'current_cases'},
  {'Table': 'current_deaths'},
  {'Table': 'current_recovered'},
  {'Table': 'deaths_by_age_range'},
  {'Table': 'deaths_by_age_sex'},
  {'Table': 'deaths_by_

In [7]:
query = '''
SELECT cases.confirmed_count, cases.death_count, places.country_region  
FROM cases
INNER JOIN places
ON places.place_id = cases.place_id
WHERE places.place_id = 98
LIMIT 100
'''

res = requests.get('https://www.dolthub.com/api/v1alpha1/{}/{}/{}'.format(owner, repo, branch), params={'q': query})

In [8]:
#res.json()

In [9]:
df = pd.DataFrame(res.json()['rows'])
df.head(10)

Unnamed: 0,confirmed_count,death_count,country_region
0,0,0.0,Brazil
1,1,,Brazil
2,2,,Brazil
3,4,,Brazil
4,13,,Brazil
5,20,,Brazil
6,25,,Brazil
7,31,,Brazil
8,38,,Brazil
9,52,,Brazil


In [10]:
query = '''
select * from time_series where country_region = "Brazil"
'''
res = requests.get('https://www.dolthub.com/api/v1alpha1/{}/{}/{}'.format(owner, repo, branch), params={'q': query})

In [11]:
#res.json()
df = pd.DataFrame(res.json()['rows'])
df.head(10)

Unnamed: 0,country_region,province_state,observation_time,confirmed_count,death_count,recovered_count
0,Brazil,,"Wed, 22 Jan 2020 00:00:00 GMT",0,0.0,0.0
1,Brazil,,"Wed, 26 Feb 2020 00:00:00 GMT",1,,
2,Brazil,,"Sat, 29 Feb 2020 00:00:00 GMT",2,,
3,Brazil,,"Wed, 04 Mar 2020 00:00:00 GMT",4,,
4,Brazil,,"Fri, 06 Mar 2020 00:00:00 GMT",13,,
5,Brazil,,"Sun, 08 Mar 2020 00:00:00 GMT",20,,
6,Brazil,,"Mon, 09 Mar 2020 00:00:00 GMT",25,,
7,Brazil,,"Tue, 10 Mar 2020 00:00:00 GMT",31,,
8,Brazil,,"Wed, 11 Mar 2020 00:00:00 GMT",38,,
9,Brazil,,"Thu, 12 Mar 2020 00:00:00 GMT",52,,


### [Cloning Data Locally with Dolt CLI inside Jupyter](https://deepnote.com/project/cacec925-c951-4d1e-bbf5-eaeaa9b1e8fc#%2Fdolt-demo.ipynb)

In [12]:
!rm -rf nba-players/
!dolt clone dolthub/nba-players

cloning https://doltremoteapi.dolthub.com/dolthub/nba-players
Retrieving remote informatio0 of 2,772 chunks complete. 0 chunks being downloaded currentl0 of 2,772 chunks complete. 4 chunks being downloaded currentl4 of 2,772 chunks complete. 0 chunks being downloaded currentl4 of 2,772 chunks complete. 4 chunks being downloaded currentl8 of 2,772 chunks complete. 0 chunks being downloaded currentl8 of 2,772 chunks complete. 4 chunks being downloaded currentl12 of 2,772 chunks complete. 0 chunks being downloaded currently12 of 2,772 chunks complete. 3 chunks being downloaded currently15 of 2,772 chunks complete. 0 chunks being downloaded currently15 of 2,772 chunks complete. 4 chunks being downloaded currently19 of 2,772 chunks complete. 0 chunks being downloaded currently19 of 2,772 chunks complete. 3 chunks being downloaded currently22 of 2,772 chunks complete. 0 chunks being downloaded currently22 of 2,772 chunks complete. 3 chunks being downloaded currently25 of 2,772 chunks complet

In [13]:
# Instantiate a Dolt repository object using your cloned repository
repo = Dolt('nba-players')

# Read a table into a dataframe
players_df = read_table(repo, 'players')

print(players_df.head())

11-10 13:18:52 doltpy.core.dolt INFO     Creating engine for Dolt SQL Server instance running on 127.0.0.1:3306
11-10 13:18:52 doltpy.core.dolt INFO     


   id       full_name first_name last_name  is_active
0   2     Byron Scott      Byron     Scott          0
1   3      Grant Long      Grant      Long          0
2   7     Dan Schayes        Dan   Schayes          0
3   9  Sedale Threatt     Sedale   Threatt          0
4  12      Chris King      Chris      King          0


In [14]:
type(players_df)

pandas.core.frame.DataFrame

In [15]:
# Run SQL using Doltpy

# Start a MySQL compatible server
repo.sql_server()

# Give the server a bit of time to start up
time.sleep(3)

# Define a SQL query
query = '''
  SELECT *
  FROM players
  WHERE full_name = 'Michael Jordan'
'''

# Retrieve a dataframe using the query 
# and the MySQL connnector that ships with Doltpy
df = pandas_read_sql(query, repo.get_engine())

print(df.head())

# Stop the SQL Server when we're done
repo.sql_server_stop()

    id       full_name first_name last_name  is_active
0  893  Michael Jordan    Michael    Jordan          0


In [16]:
# You can also use the DoltHub SQL API, as previously

owner, reponame = 'dolthub', 'nba-players'

# Every repository on DoltHub has an API that accepts SQL queries and returns JSON
# The limit on query time is 20 seconds so don't get crazy
api_root = 'https://www.dolthub.com/api/v1alpha1/{}/{}'
res = requests.get(api_root.format(owner, reponame), params={'q': query})

goat_json = res.json()

print(goat_json.get('rows'))

[{'id': '893', 'full_name': 'Michael Jordan', 'first_name': 'Michael', 'last_name': 'Jordan', 'is_active': '0'}]


In [17]:
df = pd.DataFrame(goat_json['rows'])
df.head(10)

Unnamed: 0,id,full_name,first_name,last_name,is_active
0,893,Michael Jordan,Michael,Jordan,0


#### That's all good but you can also modify the data without fear

#### Using the Dolt CLI...

In [23]:
# Put the repo back to starting state
!cd nba-players && dolt reset --hard

In [24]:
# Run a aquery against a repo
!cd nba-players && dolt sql -q "alter table players add column goat_debate tinyint"
!cd nba-players && dolt diff

[1mdiff --dolt a/players b/players
[0m[1m--- a/players @ d996j2u7hf4qfm6ja9hjbes80tgaarb9
[0m[1m+++ b/players @ o9njlk9l3cp45pbn7vgk6unutcsvomfs
[0m  CREATE TABLE players (
    `id` BIGINT NOT NULL
    `full_name` LONGTEXT
    `first_name` LONGTEXT
    `last_name` LONGTEXT
    `is_active` TINYINT
[32m+   `goat_debate` TINYINT[0m
     PRIMARY KEY (id)
  );

+-----+----+-----------+------------+-----------+-----------+-------------+
| [31m < [0m | id | full_name | first_name | last_name | is_active |             |
| [32m > [0m | id | full_name | first_name | last_name | is_active | goat_debate |
+-----+----+-----------+------------+-----------+-----------+-------------+
+-----+----+-----------+------------+-----------+-----------+-------------+


In [25]:
# Or modify it using Doltpy...
goats_update = '''
  update players
  set goat_debate=1
  where full_name = 'Michael Jordan' 
  OR full_name = 'LeBron James'
'''

repo.sql(goats_update)

11-10 13:20:21 doltpy.core.dolt INFO     Query OK, 2 rows affected



In [26]:
repo.sql_server()
time.sleep(3)

goats_df = pandas_read_sql("select * from players where goat_debate=1", repo.get_engine())

print(goats_df)

repo.sql_server_stop()

     id       full_name first_name last_name  is_active  goat_debate
0  2544    LeBron James     LeBron     James          1            1
1   893  Michael Jordan    Michael    Jordan          0            1
