# Now You Code In Class: 
## Tricks of The Pandas Masters Volume I

We will try something a bit different for our Activity - A series of Pandas coding challenges!

Datasets we will use:

- https://raw.githubusercontent.com/mafudge/datasets/master/flights/sample-flights.csv
- https://raw.githubusercontent.com/mafudge/datasets/master/orders/sample-orders.csv


In [2]:
import pandas as pd
import numpy as np
from IPython.display import display
from ipywidgets import widgets, interact_manual

pd.set_option('display.max_colwidth', None)

## Reading a dataset into a dataframe.

The following code loads the airline flights dataset into the variable `flights`

In [3]:
flights = pd.read_csv(" https://raw.githubusercontent.com/mafudge/datasets/master/flights/sample-flights.csv")
flights.head()

Unnamed: 0,flight_number,departure_airport_code,arrival_airport_code,departure_date,arrival_date,departure_time,arrival_time,flight_duration,airline_name,aircraft_type
0,1350,KJP,VOG,2022-03-26,2022-03-07,5:04,23:25,10.96,United,Embraer E190
1,5381,FUN,POW,2022-11-01,2022-07-05,19:32,13:09,10.29,Southwest,Embraer E190
2,2892,ROR,COO,2022-11-09,2022-05-16,0:02,19:45,10.65,Delta,Boeing 747
3,2406,XGA,HCM,2022-01-09,2022-02-13,19:32,11:45,12.2,American,Boeing 737
4,1261,TDK,LKU,2022-02-07,2022-01-26,4:25,16:50,7.05,United,Boeing 737


In [None]:
# PROMPT 1 read orders data into variable called "orders" and display the first few rows


## What does the data look like?

This code uses `info()` to get information about the columns and datatypes of the dataframe.


In [None]:
flights.info()

In [None]:
# PROMPT 2 - get information for the "orders" dataframe
# does every order have an email?


## What are the different aircraft names?

This code will use `value_counts()` to produce counts of the different aircraft names

In [None]:
flights["aircraft_type"].value_counts()

In [None]:
# PROMPT 3  - get the value counts for order status


## This prints a list of unique airline names

We use `unique()` on the series to get a unique list of value, and `dropna()` to get rid of the empty values.

In [4]:
airlines = list(flights['airline_name'].dropna().unique())
airlines.sort()
print(airlines)

['American', 'Delta', 'Jetblue', 'Southwest', 'United']


In [5]:
# PROMPT 4 - get a unique list of the customer country this time using the provided dedupe_series function
def dedupe_series(series: pd.Series) -> list:
    items = sorted(list(series.dropna().unique()))
    return items

#Your Code Here

## Create a drop-down list of airlines.

this creates a drop-down selection widget based on the airline values

In [None]:
airline_dropdown = widgets.Dropdown(options=airlines, description="Airline")
display(airline_dropdown)

In [None]:
# PROMPT 5 - create a dropdown of countries from orders


## Get stats on the numerical columns

The `describe()` method function will get statistics for the numerical values in the dataframe.

In [None]:
flights.describe()

In [None]:
# PROMPT 6 - that is the least expensive order? Most expensive shipping amount?


## Storing Min and Max in variables

This example stores the shortest and longest flights in separate variables.

In [6]:
shortest = flights['flight_duration'].min()
longest = flights['flight_duration'].max()
print(shortest, longest)

1.02 15.99


In [None]:
# PROMPT 7 - store the largest and smallest orders order total in variables.


## Creating a Range Slider widget

This example creates a Range slider widget for flight duration, setting the upper and lower bounds to the min/max values.

In [7]:
flight_duration_slider = widgets.FloatRangeSlider(
    min = shortest, max=longest, step=0.5, description="Duration")
display(flight_duration_slider)

FloatRangeSlider(value=(4.7625, 12.2475), description='Duration', max=15.99, min=1.02, step=0.5)

In [None]:
# PROMPT 8 - Create a range slider for orders using min/max approach


## Let's engineer a column!

This example will create a YEAR column by slicing the first 4 characters from the date. Since the data type of the `departure_date` is Object we must use the `.str` property to get the string value.


In [10]:
flights["departure_year"] = flights["departure_date"].str[:4]
flights.head()

Unnamed: 0,flight_number,departure_airport_code,arrival_airport_code,departure_date,arrival_date,departure_time,arrival_time,flight_duration,airline_name,aircraft_type,departure_year
0,1350,KJP,VOG,2022-03-26,2022-03-07,5:04,23:25,10.96,United,Embraer E190,2022
1,5381,FUN,POW,2022-11-01,2022-07-05,19:32,13:09,10.29,Southwest,Embraer E190,2022
2,2892,ROR,COO,2022-11-09,2022-05-16,0:02,19:45,10.65,Delta,Boeing 747,2022
3,2406,XGA,HCM,2022-01-09,2022-02-13,19:32,11:45,12.2,American,Boeing 737,2022
4,1261,TDK,LKU,2022-02-07,2022-01-26,4:25,16:50,7.05,United,Boeing 737,2022


In [None]:
# PROMPT 9 - create an order year column!


In [9]:
# PROMPT 9.5
orders['orderstatus'].str.title()

NameError: name 'orders' is not defined

In [None]:
# prompt 10 - create an order month column!


## United airlines flights

This example uses a boolean filter to create a smaller dataframe of just United airlines flights.

In [12]:

ua_filter= flights["airline_name"] == "United"
ua_flights = flights[ua_filter]
ua_flights

Unnamed: 0,flight_number,departure_airport_code,arrival_airport_code,departure_date,arrival_date,departure_time,arrival_time,flight_duration,airline_name,aircraft_type,departure_year
0,1350,KJP,VOG,2022-03-26,2022-03-07,5:04,23:25,10.96,United,Embraer E190,2022
4,1261,TDK,LKU,2022-02-07,2022-01-26,4:25,16:50,7.05,United,Boeing 737,2022
6,7066,IXC,YTJ,2022-03-23,2022-12-19,10:51,21:11,13.01,United,Boeing 747,2022
7,5122,YVC,EGI,2022-03-19,2022-08-23,21:10,7:47,12.92,United,Embraer E190,2022
12,2730,LSZ,DTD,2022-11-15,2022-07-02,9:21,11:19,6.95,United,Boeing 777,2022
...,...,...,...,...,...,...,...,...,...,...,...
970,5726,DBS,TAC,2022-12-17,2022-05-11,6:23,2:05,15.15,United,Airbus A350,2022
976,9830,MQT,BGA,2022-08-25,2022-06-23,8:14,14:59,3.05,United,Boeing 747,2022
985,4517,BMG,MRF,2022-12-19,2022-08-20,9:25,18:49,4.93,United,Embraer E190,2022
989,4029,HEW,NRB,2022-09-06,2022-04-30,15:19,19:07,14.78,United,Boeing 777,2022


In [None]:
# Prompt 10 - display only orders that were delivered


## Dataframe Boolean Filters with logical And

Sometimes you want to filter a dataframe on two conditions for example:


    - American Airlines AND
    - Boeing 777 aircraft

To do this we must use the dataframe AND operator: `&`

Notice how we must include `()` around each boolean filter.

In [18]:
 

# Or I would do it this way

#v_filter = h_filter_df['V (px)'] > 1080
#final_answer_df = h_filter_df[v_filter]

new_filter = (flights["airline_name"] == "American") & (flights["aircraft_type"] == "Boeing 777")
special_flights = flights[new_filter]
special_flights.head()

Unnamed: 0,flight_number,departure_airport_code,arrival_airport_code,departure_date,arrival_date,departure_time,arrival_time,flight_duration,airline_name,aircraft_type,departure_year
21,2329,ELA,XPK,2022-08-01,2022-03-02,22:09,7:53,7.79,American,Boeing 777,2022
40,4116,ZLX,ASY,2022-03-01,2022-09-28,2:51,10:53,14.57,American,Boeing 777,2022
52,2567,DGF,OSP,2022-03-06,2022-08-21,13:38,22:22,14.65,American,Boeing 777,2022
78,9761,SDY,BSF,2022-05-02,2022-01-06,23:11,18:50,7.35,American,Boeing 777,2022
114,6574,GZP,FEB,2022-04-30,2022-09-10,6:32,13:48,15.44,American,Boeing 777,2022


In [17]:
# PROMPT 11 - show "special orders": those orders delivered to the Canada in year 2023
(flights["airline_name"] == "American") & (flights["aircraft_type"] == "Boeing 777")

0      False
1      False
2      False
3      False
4      False
       ...  
995    False
996    False
997    False
998    False
999    False
Length: 1000, dtype: bool

## Flight Tracker

Inputs:

    - Range for the duration of the flight
    - Airline
    
Outputs:
    
    -DataFrame of flights matching that criteira


In [None]:
# Get Data
airlines = sorted(list(flights['airline_name'].dropna().unique()))
shortest = flights['flight_duration'].min()
longest = flights['flight_duration'].max()

# Make widgets
airline_dropdown = widgets.Dropdown(options=airlines, description="Airline")
flight_duration_slider = widgets.FloatRangeSlider(
    min = shortest, max=longest, step=0.5, description="Duration")

@interact_manual(airline=airline_dropdown, duration=flight_duration_slider)
def on_click(airline, duration):
    filtered_flights = flights[
        (flights["airline_name"] == airline) &
        (flights["flight_duration"] >= duration[0]) &
        (flights["flight_duration"] <= duration[1])
    ]
    display(filtered_flights)



## Order Report

Inputs:

    - Range Slider for the order amount total
    - Year of order, Order Status, Customer Country as drop downs
    
Outputs:
    
    -DataFrame of orders matching the selected criteria


In [None]:
orders.columns # a refresher of the available columns

In [None]:
# PROMPT 12 - make the order report!

# Get Data for widgets
orders = pd.read_csv("https://raw.githubusercontent.com/mafudge/datasets/master/orders/sample-orders.csv")
orders['orderyear'] = orders['orderdate'].str[:4]

# Make widgets


# main iteract 
@interact_manual(    )
def on_click(   ):
    
    
    
    display(filtered_orders)