# Data Pipelines
- Responsible for moving data from a source to a destination and transforming it somewhere along the way
    - So you pulled data and transform it
- This is what you create and maintain as a Data Engineer

# ETL 
- A flavor of Data Pipeline
- First extract data and transform it before loading it into a destination
- Sources maybe tabular or non-tabular

Sample ETL in Python:
```python
def load(data_frame, traget_table):
    # some custom-built python logic to load data to SQL
    data_frame.to_sql(name=target_table, con=POSTGRES_CONNECTION)
    printf(f"Loading data to the {target_table} table")

# Now, run the data pipeline
extracted_data = extract(file_name="raw_data.csv")
transformed_data = transform(data_frame=extracted_data)
load(data_frame=transformed_data, taregt_table="cleaned_data")
```

Output:

Extracting data from raw_data.csv

Transforming data to remove 'null' records

Loading data to the cleaned_data table

# ELT 
- Extracts and load the data before transforming it
- More recent because of Data warehouses
- typically a tabular data

Sample ELT:

```python
def transform(source_table, target_table):
    data_warehouse.run_sql(
    """
    CREATE TABLE {target_table} AS
        SELECT
            <field-name>, <field-name>, ...
        FROM {source_table};
    """)
# Similar to ETL pipelines, call the extract, load, and transform functions
extracted_data = extract(file_name="raw_data.csv")
load(data_frame=extracted_data, table_name="raw_data")
transform(source_table="raw_data", target_table="cleaned_data")
```

# Building ETL and ELT Pipelines

- First we need to be able to pull data from a csv

## PULL:
```python
import pandas as pd

data_frame = pd.read_csv('raw_data.csv')
```

## TRANSFORM:
Once we extracted data we can begin to filter (you are just cleaning this, Kim, `loc` and `iloc` lang

## LOAD:
If you want to load the dataframe to a .csv file, you can do:

```python
data_frame.to_csv("cleaned_data.csv")
```

There are also `to_sql()`, `to_json()`

## Using SQL when Transforming:
Sometimes we have to use SQL when transforming from a datawarehouse:

```python
data_warehouse.execute(
    """
    CREATE TABLE total_sales AS
        SELECT
            ds,
            SUM(sales),
        FROM raw_sales_data
        GROUP BY ds;
    """
)
```

We can use sqlAlchemy here oy snowflakes connector

## Putting it all together:
```python
# Define extract(), transform(), load() functions

def transform(data_frame, value):
    return data_frame.loc[data_frame['name']] == value, ['name', 'num_firms']]

# first, extract data from .csv
extracted_data = extract(file_name='raw_data.csv')

# then, transform the 'extracted_data'
transformed_data = transform(data_frame = extracted_data, value = 'Apparel')

# finally, load the transformed data
load(data_frame=transformed_data, file_name='cleaned_data.csv')
```

Another example:

```python
def extract(file_name):
  return pd.read_csv(file_name)

def transform(data_frame):
  return data_frame.loc[:, ["industry_name", "number_of_firms"]]

def load(data_frame, file_name):
  data_frame.to_csv(file_name)
  
extracted_data = extract(file_name="raw_industry_data.csv")
transformed_data = transform(data_frame=extracted_data)

# Pass the transformed_data DataFrame to the load() function
load(data_frame=transformed_data, file_name="number_of_firms.csv")
```