# Storing Storm Data

## About:

International Hurricane Watchgroup (IHW) has been asked to update their analysis tools. Because of the increase in public awareness of hurricanes, they are required to be more diligient with the analysis of historical hurricane data they share across the organization, and want to productionize their services.

Current method of sharing the data with their data anaylsts has been to save a CSV file on their local servers and have every data analyst pull the data down. Then, each analyst uses a local SQLite engine to store the CSV, run their queries, and send their results around.

## Goal:
Convert data file into PostgreSQL database

### [Dataset](https://dq-content.s3.amazonaws.com/251/storm_data.csv) description:
- fid - ID for the row
- year - Recorded year
- month - Recorded month
- day - Recorded date
- ad_time - Recorded time in UTC
- btid - Hurricane ID
- name - Name of the hurricane
- lat - Latitude of the recorded location
- long - Longitude of the recorded location
- wind_kts - Wind speed in knots per second
- pressure - Atmospheric pressure of the hurricane
- cat - Hurricane category
- basin - The basin the hurricane is located
- shape_leng - Hurricane shape length

#### Importing libs

In [54]:
import io
import csv
import psycopg2
from datetime import datetime
from urllib import request

#### Downloading and examining the data

In [16]:
response = request.urlopen("https://dq-content.s3.amazonaws.com/251/storm_data.csv")
reader = csv.reader(io.TextIOWrapper(response))

In [17]:
# check max len of name, cat columns
max_len_name = 0
max_len_cat = 0
for line in reader:
    if len(line[6]) > max_len_name:
        max_len_name = len(line[6])
    if len(line[11]) > max_len_cat:
        max_len_cat = len(line[1])

In [18]:
print(max_len_name)
print(max_len_cat)

9
4


#### Connecting to local database

Database IWH was created directly in psql via console

In [109]:
conn = psycopg2.connect(dbname='ihw', user='postgres', password='1138') # need to place your own password
cur = conn.cursor()

#### Creating table for data storing

In [110]:
query = """
        DROP TABLE IF EXISTS storm_data;
        """
cur.execute(query)
conn.commit()

In [111]:
query = """
        CREATE TABLE storm_data (fid INTEGER PRIMARY KEY,
                                 datetime_utc TIMESTAMP,
                                 btid INTEGER,
                                 name VARCHAR(9),
                                 lat REAL,
                                 long REAL,
                                 wind_kts INTEGER,
                                 pressure INTEGER,
                                 cat VARCHAR(4),
                                 basin TEXT,
                                 shape_leng REAL
                                )
        """
cur.execute(query)
conn.commit()

#### Reading data to table

In [112]:
response = request.urlopen("https://dq-content.s3.amazonaws.com/251/storm_data.csv")
reader = csv.reader(io.TextIOWrapper(response))

In [113]:
next(reader)
for line in reader:
    import_data = line[:1]
    import_data.append(datetime(int(line[1]), int(line[2]), int(line[3]), int(line[4][:2]), int(line[4][2:-2])))
    for i in range(5, 14):
        import_data.append(line[i])
    cur.execute("INSERT INTO storm_data VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", import_data)
conn.commit()    

In [114]:
cur.execute("SELECT * FROM storm_data LIMIT 0")
cur.description

(Column(name='fid', type_code=23, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None),
 Column(name='datetime_utc', type_code=1114, display_size=None, internal_size=8, precision=None, scale=None, null_ok=None),
 Column(name='btid', type_code=23, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None),
 Column(name='name', type_code=1043, display_size=None, internal_size=9, precision=None, scale=None, null_ok=None),
 Column(name='lat', type_code=700, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None),
 Column(name='long', type_code=700, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None),
 Column(name='wind_kts', type_code=23, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None),
 Column(name='pressure', type_code=23, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None),
 Column(name='cat', type_code=1043, display_size=None, internal_size

In [115]:
cur.execute("SELECT COUNT(fid) FROM storm_data")
print(cur.fetchall())

[(59228,)]


### Creating a read-only user that can interact with data

In [116]:
query = """
        CREATE USER data_viewer WITH PASSWORD 'pass';
        REVOKE ALL ON storm_data FROM data_viewer;
        GRANT SELECT ON storm_data TO data_viewer;
        """
cur.execute(query)
conn.commit()

In [117]:
conn.close()