# Creating a Postgres database to store storm data
## Introduction
The goal of this project is to create an efficient database to store storm data currently in csv form.
Steps:
* Data exploration
* Selection of column data types
* Database creation and User management
* Insertion of data

## Exploring the data

In [3]:
import pandas as pd
data = pd.read_csv('storm_data.csv')
data.head()

Unnamed: 0,FID,YEAR,MONTH,DAY,AD_TIME,BTID,NAME,LAT,LONG,WIND_KTS,PRESSURE,CAT,BASIN,Shape_Leng
0,2001,1957,8,8,1800Z,63,NOTNAMED,22.5,-140.0,50,0,TS,Eastern Pacific,1.140175
1,2002,1961,10,3,1200Z,116,PAULINE,22.1,-140.2,45,0,TS,Eastern Pacific,1.16619
2,2003,1962,8,29,0600Z,124,C,18.0,-140.0,45,0,TS,Eastern Pacific,2.10238
3,2004,1967,7,14,0600Z,168,DENISE,16.6,-139.5,45,0,TS,Eastern Pacific,2.12132
4,2005,1972,8,16,1200Z,251,DIANA,18.5,-139.8,70,0,H1,Eastern Pacific,1.702939


### Data Dictionary
* fid - ID for the row
* year - Recorded year
* month - Recorded month
* day - Recorded date
* ad_time - Recorded time in UTC
* btid - Hurricane ID
* name - Name of the hurricane
* lat - Latitude of the recorded location
* long - Longitude of the recorded location
* wind_kts - Wind speed in knots per second
* pressure - Atmospheric pressure of the hurricane
* cat - Hurricane category
* basin - The basin the hurricane is located
* shape_leng - Hurricane shape length

Before we pick the best datatypes to store data the most efficiently possible, let's explore some columns further.

In [12]:
data.describe()

Unnamed: 0,FID,YEAR,MONTH,DAY,BTID,LAT,LONG,WIND_KTS,PRESSURE,Shape_Leng
count,59228.0,59228.0,59228.0,59228.0,59228.0,59228.0,59228.0,59228.0,59228.0,59228.0
mean,29614.5,1957.194874,8.540521,15.867326,648.398899,23.5264,-83.196863,54.726802,372.3368,1.201987
std,17097.795209,41.665792,1.364174,8.793432,372.376803,9.464955,37.282152,25.133577,480.562974,0.834497
min,1.0,1851.0,1.0,1.0,1.0,4.2,-180.0,10.0,0.0,0.0
25%,14807.75,1928.0,8.0,8.0,344.0,16.1,-108.5,35.0,0.0,0.707107
50%,29614.5,1970.0,9.0,16.0,606.0,21.2,-81.2,50.0,0.0,1.029563
75%,44421.25,1991.0,9.0,23.0,920.0,29.6,-62.2,70.0,990.0,1.431782
max,59228.0,2008.0,12.0,31.0,1410.0,69.0,180.0,165.0,1024.0,11.18034


## Selection of datatypes
* FID goes beyond the smallint range, so let's pick INTEGER type
* YEAR, MONTH, DAY, AD_TIME can be combined into a single TIMESTAMP
* BTID, WIND_KTS and PRESSURE will fit as a SMALLINT since they have no decimals and ranges within the required range of smallint
* LAT and LONG have clear physical mins and maxs, and are only stored with one decimal number, so DECIMAL(4,1)
* Shape_Leng always has 6 digits after the decimal, and a maximum of 11.180340 - a good candidate for DECIMAL(8,6)

## Creating database and analyst user
Let's now create database for storm related stuff as well as the table within it that will store this data.
### Creating database

In [1]:
import psycopg2
conn = psycopg2.connect(dbname="postgres", user="postgres")
conn.autocommit = True
cursor = conn.cursor()
cursor.execute("DROP DATABASE IF EXISTS storm")
cursor.execute("CREATE DATABASE storm owner postgres")
conn.close()

OperationalError: could not connect to server: Connection refused (0x0000274D/10061)
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5432?
could not connect to server: Connection refused (0x0000274D/10061)
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5432?


### Creating table

In [None]:
conn = psycopg2.connect(dbname="storm", user="postgres")
cur.execute("DROP TABLE IF EXISTS storm")
cur.execute("""CREATE TABLE storm(
                FID INTEGER PRIMARY KEY,
                UTC_TIME TIMESTAMP,
                BTID SMALLINT,
                LAT DECIMAL(4,1),
                LONG DECIMAL(4,1),
                WIND_KTS SMALLINT,
                PRESSURE SMALLINT,
                Shape_Leng DECIMAL(8,6))""")
conn.commit()
cur.execute('SELECT * FROM storm LIMIT 0')
print(cur.description)
conn.close()