# Summary: Reading and writing CSVs

### Importing Data Efficiently
Instead of manually entering data into a DataFrame, you can efficiently load data from external files like CSVs.

### Understanding CSV Files
CSV (comma-separated values) files are a common format for storing tabular data. Each line represents a row, and each value within a row is separated by a comma. CSVs are widely supported across databases, programming languages, and data analysis tools, making them ideal for sharing data.

### Loading CSVs into DataFrames
You can easily read data from a CSV into a pandas DataFrame using a built-in function that takes the file path as input.

### Manipulating Imported Data
Once the data is in a DataFrame, you can use pandas functions to manipulate or enhance it, such as adding new calculated columns.

### Exporting Data to CSV
After modifying a DataFrame, you can save it back to a CSV file to share or store the updated data using a pandas function that writes the DataFrame to a specified file path.


## Exercise: LCSV to DataFrame

You are analyzing airline passenger data to see how frequently travelers were involuntarily denied boarding in 2016 and 2017. The data is provided as a CSV file, and your task is to load it into a pandas DataFrame, summarize it by airline, and calculate the number of passengers bumped per 10,000 passengers.

### Instructions

1. Load the CSV file into a pandas DataFrame.
2. Inspect the first few rows of the DataFrame.
3. Group the data by airline and calculate the total number of passengers bumped and total passengers for each airline.
4. Create a new column that shows the number of passengers bumped per 10,000 passengers for each airline.
5. Display the summarized DataFrame.


In [2]:
# Import pandas (already done in your environment)
import pandas as pd

# Step 1: Load CSV into a DataFrame
airline_bumping = pd.read_csv("datasets/airline_bumping.csv")

# Step 2: Inspect the first few rows
print(airline_bumping.head())

             airline  year  nb_bumped  total_passengers
0    DELTA AIR LINES  2017        679          99796155
1     VIRGIN AMERICA  2017        165           6090029
2    JETBLUE AIRWAYS  2017       1475          27255038
3    UNITED AIRLINES  2017       2067          70030765
4  HAWAIIAN AIRLINES  2017         92           8422734


In [3]:
# Step 3: Group by airline and sum the relevant columns
airline_totals = airline_bumping.groupby("airline")[["nb_bumped", "total_passengers"]].sum()

# Step 4: Calculate bumps per 10,000 passengers
airline_totals["bumps_per_10k"] = airline_totals["nb_bumped"] / airline_totals["total_passengers"] * 10000

# Step 5: Display the results
print(airline_totals)

                     nb_bumped  total_passengers  bumps_per_10k
airline                                                        
ALASKA AIRLINES           1392          36543121       0.380920
AMERICAN AIRLINES        11115         197365225       0.563169
DELTA AIR LINES           1591         197033215       0.080748
EXPRESSJET AIRLINES       3326          27858678       1.193883
FRONTIER AIRLINES         1228          22954995       0.534960
HAWAIIAN AIRLINES          122          16577572       0.073593
JETBLUE AIRWAYS           3615          53245866       0.678926
SKYWEST AIRLINES          3094          47091737       0.657015
SOUTHWEST AIRLINES       18585         228142036       0.814624
SPIRIT AIRLINES           2920          32304571       0.903897
UNITED AIRLINES           4941         134468897       0.367446
VIRGIN AMERICA             242          12017967       0.201365


## Exercise: DataFrame to CSV

After analyzing airline bumping data, the next step is to make your results easier to share. You’ll sort the DataFrame based on the number of passengers bumped per 10,000 passengers and save it as a CSV file for colleagues.

### Instructions

1. Sort the `airline_totals` DataFrame by `bumps_per_10k` in descending order and store it as `airline_totals_sorted`.
2. Print the sorted DataFrame to inspect it.
3. Export the sorted DataFrame to a CSV file named `"airline_totals_sorted.csv"`.

In [4]:
# Step 1: Sort the DataFrame by bumps_per_10k in descending order
airline_totals_sorted = airline_totals.sort_values("bumps_per_10k", ascending=False)

# Step 2: Print the sorted DataFrame
print(airline_totals_sorted)

# Step 3: Export the sorted DataFrame to CSV
airline_totals_sorted.to_csv("datasets/airline_totals_sorted.csv")

                     nb_bumped  total_passengers  bumps_per_10k
airline                                                        
EXPRESSJET AIRLINES       3326          27858678       1.193883
SPIRIT AIRLINES           2920          32304571       0.903897
SOUTHWEST AIRLINES       18585         228142036       0.814624
JETBLUE AIRWAYS           3615          53245866       0.678926
SKYWEST AIRLINES          3094          47091737       0.657015
AMERICAN AIRLINES        11115         197365225       0.563169
FRONTIER AIRLINES         1228          22954995       0.534960
ALASKA AIRLINES           1392          36543121       0.380920
UNITED AIRLINES           4941         134468897       0.367446
VIRGIN AMERICA             242          12017967       0.201365
DELTA AIR LINES           1591         197033215       0.080748
HAWAIIAN AIRLINES          122          16577572       0.073593
