# Week 1

```yml
title: "Prep Air's Flow Card"
week: 1
posted_on: 2024-01-03
created_on: 2024-03-07
last_upated: 2024-03-08
input: [
    "PD 2024 Wk 1 Input.csv"
]
output: [
    "passenger_flight_details.ndjson"
]
```

## Setup the notebook

In [1]:
from pathlib import Path

import polars as pl

## Parameters

Directory names.

In [2]:
ROOT = Path("../..")
DATA = "data"
INPUT = "input"
OUTPUT = "output"

Input and output file names.

In [3]:
PASSENGER_FLIGHT_DETAILS_CSV = "PD 2024 Wk 1 Input.csv"
PASSENGER_FLIGHT_DETAIL_NDJSON = "passenger_flight_details.ndjson"

## Load the data

In [4]:
raw_data = pl.scan_csv(ROOT / DATA / INPUT / PASSENGER_FLIGHT_DETAILS_CSV)

## Preprocess the data

In [5]:
pre_data = raw_data.with_row_index("id").rename(
    {
        "Flight Details": "flight_details",
        "Flow Card?": "has_flow_card",
        "Bags Checked": "number_of_bags_checked",
        "Meal Type": "meal_type",
    }
)

## Transform the data

In [6]:
clean_data = pre_data.with_columns(
    pl.col("flight_details").str.split("//").cast(pl.Array(str, 5))
).select(
    "id",
    pl.col("flight_details").arr.get(0).str.to_date().alias("date"),
    pl.col("flight_details").arr.get(1).alias("flight_number"),
    pl.col("flight_details").arr.get(2).str.extract(r"(\w+\s?\w*)-").alias("from"),
    pl.col("flight_details").arr.get(2).str.extract(r"-(\w+\s?\w*)").alias("to"),
    pl.col("flight_details").arr.get(3).alias("class"),
    pl.col("flight_details").arr.get(4).cast(float).alias("price"),
    pl.col("has_flow_card").cast(pl.Boolean),
    "number_of_bags_checked",
    "meal_type",
)

## Export the data

We only export the full dataset, despite the direction of the exercise.

In [7]:
clean_data.collect().write_ndjson(ROOT / DATA / OUTPUT / PASSENGER_FLIGHT_DETAIL_NDJSON)