# Database

Create a sqlLite database that we can use for faster indexing.

```
df.to_sql("table_name", conn, if_exists="replace/append")        # write
pd.read_sql_query("select * from table_name limit 1;", conn)     # read
```

In [1]:
# run two commands to clear the database (only for fresh)
% rm ../../data/canonical/tweets.db
!touch ../../data/canonical/tweets.db

In [2]:
# Libraries

%run utilities.py
import sqlite3

In [3]:
# Directories
indir = make_new_dir_date(processed_finals_dir)
combined_f = name_file_path('combined.csv', indir)

outdir = data_dir + 'canonical/'

Directory already exists, but you can still have the file name


#### Steps

1. Open connection
2. Create schemas
3. Iterate through the processed files, reading & saving them to sqlLite3
4. Close the connection
5. ???
6. Profit

In [4]:
# Schemas
raw_schema = """CREATE TABLE IF NOT EXISTS Raw(
  tweetID INTEGER,
  date TEXT,
  username TEXT,
  message TEXT,
  retweet INTEGER,
  longitude REAL,
  latitude REAL
);""".replace('\n', '')

tweets_schema = """CREATE TABLE IF NOT EXISTS Tweets(
  tweetID INTEGER REFERENCES Raw(tweetID),
  message TEXT,
  retweet INTEGER,
  longitude REAL,
  latitude REAL,
  date TEXT
);""".replace('\n', '')

users_schema = """CREATE TABLE IF NOT EXISTS Users(
  username TEXT REFERENCES Raw(username),
  tweetID INTEGER
);""".replace('\n', '')

dates_schema = """CREATE TABLE IF NOT EXISTS Dates(
  date TEXT REFERENCES Raw(date),
  tweetID INTEGER
);"""

In [5]:
# Database beginnings
db_f = name_file_path('tweets.db', outdir)
conn = sqlite3.connect(db_f)

In [6]:
## Create database tables
cur = conn.cursor()
for schema in [raw_schema, tweets_schema, users_schema, dates_schema]:
    try:
        cur.execute(schema)
    except Exception as e:
        print (schema, e)
conn.commit()

In [7]:
# Insert data to DB
df = pd.read_csv(combined_f, low_memory=False, engine='c')
save_df_to_sql(df, conn)

In [8]:
# close up shop
conn.close()