# Data Import Tools 

This notebook has a refinement of the cell that you used in the Cabrillo Courses notebook from a couple of weeks ago. It will import CSVs into a corresponding tables. Use it to get your project started. 

Start by running the next cell to get setup. 

In [3]:
%load_ext sql
%config SqlMagic.autolimit=500
    
import re 
import pathlib
import subprocess 
import folium
import folium.plugins
import pandas as pd 
from sqlalchemy import create_engine

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## Import Your Data Files 

On my Jupyter server you can drag and drop data files alongside this notebook and use them in the next cell. If you're using Google Colaboratory you should connect your Google Drive to the notebook. I recoreded how to do this in class. Here's a video showing the process: 

https://youtu.be/mNTqIw-Oy44

Change the data files in the next cell. You can have as many as you like and as few as one. Name the files (with their path in Colab) inside of quotes and separated by commas. 

**This cell may take a long time if you have lots of data.** 

In [8]:
# Change these to match your data files. 
datafiles = ["MasterCourseFile.csv", "ProgramFile.csv", "ProgramCourseFile.csv"]

# This will create a file called projdata.sqlite3 with your schema.
url = 'sqlite:///projdata.sqlite3'
engine = create_engine(url, echo=False)
for f in map(pathlib.Path, datafiles):
    df = pd.read_csv(f)
    df.to_sql(f.name.replace('.csv',''), con=engine, if_exists='replace')

The previous cell creates one table for each CSV file. The table will have the same name as the CSV without the file path and the `.csv` extension. Here are examples to show you what your tables are named:

| File Name | Table Name | 
| --- | --- | 
| ../../labs/MasterCourseFile.csv | MasterCourseFile | 
| ProgramCourseFile.csv | ProgramCourseFile | 
| /data/drive/My Drive/MyData.csv | MyData | 

Re running the previous cell will re-load the CSV data dropping any data that was in the tables before. If you reformat your CSV files (for example by changing the columns) the import may fail. You can fix that by deleting the `projdata.sqlite3` file and re-running the previous cell.



## Use and Query Your Data 

SQL cells should start with the `%%sql` command shown in the cell below: 

In [None]:
%%sql sqlite:///projdata.sqlite3
