# Data Sources

_Orion_ is Peak's go to framework for reading, writing & transferring data across various sources.

In this tutorial we will go over some of the most foundamental operations we perform at Peak, such as writing data to/from Redshift/S3/AIS.

## 1. Reading Data In

### 1.1 S3

In [1]:
from orion.sources import S3Source

weather = S3Source(
    bucket='kilimanjaro-prod-datalake', 
    key='newstarter/uploads/weather/1581525070376_Peak_weather.csv'
).read_csv()

### 1.2 Redshift

Unlike S3, Redshift requires access to certain environment variables before any read/write operation can be executed. We pull those environment variables into our notebook using _load_env()_ function. 

### 1.2.1 Using Query

In [2]:
from orion.contrib.envs import load_env
load_env()

from orion.sources import RedshiftSource

sql_query = """
    select 
        *
    from
        stage.weather
    """

weather = RedshiftSource(query=sql_query).read_csv()

### 1.2.2 Using File

In [3]:
from orion.contrib.envs import load_env
load_env()

from orion.sources import RedshiftSource

sql_file = "resources/weather.sql"

weather =  RedshiftSource(query_file=sql_file).read_csv()

## 2. Writing Data Out 

### 2.1 S3

In [4]:
from orion.sources import S3Source

S3Source(
    bucket="kilimanjaro-prod-datalake", 
    key="newstarter/datascience/weather.csv"
).write_csv(weather, index=False)

### 2.2 Redshift

In order to write to Redshift you need special environment variable called _Redshift IAM Role_.

To get this variable you will have to make a support ticket (https://peak-bi.atlassian.net/servicedesk/customer/portals).

Once you receive it from DevOps team member, you will need to do the following:

1. Click on the `+` icon in the top left corner.
2. Click on Terminal.
3. Type in `cd ~/` to make sure you are in the user root directory.
4. Type in `nano .env`.
5. At the bottom of the file enter: `export REDSHIFT_IAM_ROLE=<redshift-iam-role>`. <br>
    5.1 Where `redshift-iam-role` is the _Redshift IAM Role_.
6. Save changes `CTRL+X > Y > ENTER`.

#### 2.2.1 Existing Table

We create fake data sample to how a new data sample can be appending to an existing table.

In [5]:
from orion.contrib.envs import load_env
load_env()

from orion.sources import RedshiftSource

import numpy as np
import pandas as pd

from datetime import datetime

new_sample = pd.DataFrame([
    {
        'origin':np.random.choice(weather['origin']),
        'year': datetime.now().year,
        'month': datetime.now().month,
        'day': datetime.now().day,
        'hour': datetime.now().hour,
        'temp': np.random.choice(weather['temp']),
        'dewp': np.random.choice(weather['dewp']),
        'humid': np.random.choice(weather['humid']),
        'wind_dir': np.random.choice(weather['wind_dir']),
        'wind_speed': np.random.choice(weather['wind_speed']),
        'wind_gust': np.random.choice(weather['wind_gust']),
        'precip': np.random.choice(weather['precip']),
        'pressure': np.random.choice(weather['pressure']),
        'visib': np.random.choice(weather['visib']),
        'time_hour': datetime.now().strftime(format="%Y-%m-%dT%H:%M:%S%Z")
    }
], columns=weather.columns)

RedshiftSource(schema='stage', table='weather').write_csv(new_sample, index=False)

#### 2.2.2 New Table

_Orion_ will automatically create a new table according to provided DF's schema if the `table` parameter does not match to the existing table.

`overwrite=True` flag will truncate the table and copy the contents of DF into it.

In [6]:
from orion.contrib.envs import load_env
load_env()

from orion.sources import RedshiftSource

RedshiftSource(table="new_weather", schema="stage", overwrite=True).write_csv(weather, index=False)

### 2.3 AIS

Make sure to study Notebook **4. AIS API KEY** before proceeding.

In [7]:
import os

from orion.contrib.envs import load_env
load_env()

from orion.sources import AISSource

AISSource(
    target="weather/weather_20200227.csv", 
    key=os.environ['AIS_API_KEY'], 
    token=os.environ['AIS_API_KEY']
).write_csv(weather, index=False)

You can verify success of the upload by going to _AIS > Outcomes > Downloads_