## Cheat Sheet

The more topical the workshop, the more likely it is I'll be looking things up, copying, and pasting. Here's an overview of the common commands, imports, and methods you will need and probably won't remember.

### Imports

sqlite3 module to connect to a local SQLite database. 

In [6]:
import sqlite3

Pandas for pandas dataframe operations

In [7]:
import pandas as pd

pandsql module to do SQL operations directly on dataframes

In [8]:
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

### Dataset (sql-like) operations using pandas dataframes

You don't have to leave Pandas if you'd prefer to stick with python and not use SQL. For a good overview of equivalent operations between pandas and sql, see:

https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html

Load CSV files directly into a pandas dataframe

In [11]:
species_df = pd.read_csv('data/species.csv')
surveys_df = pd.read_csv('data/surveys.csv')

In [21]:
vertical_stack = pd.concat([surveys_df.head(5), surveys_df.tail(5)], axis=0)

In [22]:
vertical_stack

Unnamed: 0,record_id,month,day,year,plot_id,species_id,sex,hindfoot_length,weight
0,1,7,16,1977,2,NL,M,32.0,
1,2,7,16,1977,3,NL,M,33.0,
2,3,7,16,1977,2,DM,F,37.0,
3,4,7,16,1977,7,DM,M,36.0,
4,5,7,16,1977,3,DM,M,35.0,
35544,35545,12,31,2002,15,AH,,,
35545,35546,12,31,2002,15,AH,,,
35546,35547,12,31,2002,10,RM,F,15.0,14.0
35547,35548,12,31,2002,7,DO,M,36.0,51.0
35548,35549,12,31,2002,5,,,,


In [23]:
#inner join
inner_join_df = pd.merge(left=species_df, right=surveys_df, 
                         left_on='species_id', right_on='species_id')

In [24]:
inner_join_df.head(10)

Unnamed: 0,species_id,genus,species,taxa,record_id,month,day,year,plot_id,sex,hindfoot_length,weight
0,AB,Amphispiza,bilineata,Bird,3126,7,21,1980,8,,,
1,AB,Amphispiza,bilineata,Bird,3146,7,21,1980,24,,,
2,AB,Amphispiza,bilineata,Bird,3152,7,21,1980,19,,,
3,AB,Amphispiza,bilineata,Bird,3153,7,21,1980,22,,,
4,AB,Amphispiza,bilineata,Bird,3586,12,15,1980,16,,,
5,AB,Amphispiza,bilineata,Bird,3702,1,11,1981,22,,,
6,AB,Amphispiza,bilineata,Bird,3705,1,11,1981,22,,,
7,AB,Amphispiza,bilineata,Bird,3706,1,11,1981,20,,,
8,AB,Amphispiza,bilineata,Bird,3775,1,12,1981,6,,,
9,AB,Amphispiza,bilineata,Bird,4499,6,4,1981,23,,,


In [12]:
# indexes, sorting
inner_join_df.sort_values(['record_id'])
inner_join_df.reset_index()

In [29]:
inner_join_df.head(10)

Unnamed: 0,species_id,genus,species,taxa,record_id,month,day,year,plot_id,sex,hindfoot_length,weight
0,AB,Amphispiza,bilineata,Bird,3126,7,21,1980,8,,,
1,AB,Amphispiza,bilineata,Bird,3146,7,21,1980,24,,,
2,AB,Amphispiza,bilineata,Bird,3152,7,21,1980,19,,,
3,AB,Amphispiza,bilineata,Bird,3153,7,21,1980,22,,,
4,AB,Amphispiza,bilineata,Bird,3586,12,15,1980,16,,,
5,AB,Amphispiza,bilineata,Bird,3702,1,11,1981,22,,,
6,AB,Amphispiza,bilineata,Bird,3705,1,11,1981,22,,,
7,AB,Amphispiza,bilineata,Bird,3706,1,11,1981,20,,,
8,AB,Amphispiza,bilineata,Bird,3775,1,12,1981,6,,,
9,AB,Amphispiza,bilineata,Bird,4499,6,4,1981,23,,,


In [30]:
# outer join
left_outer_join_df = pd.merge(left=species_df, right=surveys_df, how='left', 
                         left_on='species_id', right_on='species_id')

In [None]:
left_outer_joi

In [14]:
# direct db connections
conn = sqlite3.connect("data/portal_mammals.sqlite")
c = conn.cursor()

In [16]:
#for row in c.execute('SELECT * FROM species LIMIT 5'):
#    print(row, ':', row[0])

In [18]:
# directly to a pandas dataframe
df_species = pd.read_sql_query("SELECT * from species", conn)

In [20]:
# create a new table or view
c.execute("CREATE VIEW distinct_taxa_view AS SELECT DISTINCT taxa FROM species")

<sqlite3.Cursor at 0x1231512d0>

In [21]:
c.execute("CREATE TABLE distinct_taxa_table AS SELECT DISTINCT taxa FROM species")

<sqlite3.Cursor at 0x1231512d0>