In [34]:
#Imports
import pandas as pd
import numpy as np

In [3]:
# Read data from csv file
dfa = pd.read_csv('http://stat-computing.org/dataexpo/2009/airports.csv', delimiter=',', index_col='iata')

In [4]:
# show head of specified columns
dfa[['airport','city','state']].head()

Unnamed: 0_level_0,airport,city,state
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
00M,Thigpen,Bay Springs,MS
00R,Livingston Municipal,Livingston,TX
00V,Meadow Lake,Colorado Springs,CO
01G,Perry-Warsaw,Perry,NY
01J,Hilliard Airpark,Hilliard,FL


In [5]:
# extract last four rows for 3 cols
dfa[['airport', 'lat', 'long']].tail(4)

Unnamed: 0_level_0,airport,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ZER,Schuylkill Cty/Joe Zerbey,40.706449,-76.373147
ZPH,Zephyrhills Municipal,28.228065,-82.155916
ZUN,Black Rock,35.083227,-108.791777
ZZV,Zanesville Municipal,39.944458,-81.892105


In [6]:
# Display the data types of each column
dfa.dtypes

airport     object
city        object
state       object
country     object
lat        float64
long       float64
dtype: object

In [7]:
# Display a summary of the numerical information in the DataFrame
dfa.describe()

Unnamed: 0,lat,long
count,3376.0,3376.0
mean,40.036524,-98.621205
std,8.329559,22.869458
min,7.367222,-176.646031
25%,34.688427,-108.761121
50%,39.434449,-93.599425
75%,43.372612,-84.137519
max,71.285448,145.621384


## SLICING

Since this new DataFrame was created with a labelled row index, we can use row labels to slice rows from the DataFrame. The following code cells demonstrate basic slicing and indexing of this DataFrame by using both explicit indices (row and column labels) and implicit indices (row and column index values).

In [8]:
# Slice rows by using the indicated label from the index column (EXPLICIT)

dfa.loc[['11J','11R','12C']]

Unnamed: 0_level_0,airport,city,state,country,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
11J,Early County,Blakely,GA,USA,31.396986,-84.895257
11R,Brenham Municipal,Brenham,TX,USA,30.219,-96.374278
12C,Rochelle Municipal,Rochelle,IL,USA,41.893001,-89.07829


In [10]:
# Slice rows and columns by using explicit row and column labels
dfa.loc[['11J','11R'],['airport','city','state']]

Unnamed: 0_level_0,airport,city,state
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
11J,Early County,Blakely,GA
11R,Brenham Municipal,Brenham,TX


In [9]:
# Slice rows by using the row implicit index (IMPLICIT)
dfa[99:103]

Unnamed: 0_level_0,airport,city,state,country,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
11J,Early County,Blakely,GA,USA,31.396986,-84.895257
11R,Brenham Municipal,Brenham,TX,USA,30.219,-96.374278
12C,Rochelle Municipal,Rochelle,IL,USA,41.893001,-89.07829
12D,Tower Municipal,Tower,MN,USA,47.818333,-92.291667


In [11]:
# Slice rows and columns by using implicit row and column indices
dfa.iloc[99:103,:2]

Unnamed: 0_level_0,airport,city
iata,Unnamed: 1_level_1,Unnamed: 2_level_1
11J,Early County,Blakely
11R,Brenham Municipal,Brenham
12C,Rochelle Municipal,Rochelle
12D,Tower Municipal,Tower


## MASKING

- Pandas also support selecting rows based on column values, which is known as masking.
- This is performed by specifying tests on columns that result in True or False, and only the True results are returned. 
- Thus, a row mask is formed, and masked rows are hidden and unmasked rows are selected. 
- These tests must follow the rules of Boolean logic, but can involve multiple column comparisons that are combined into one final result.

In [14]:
dfa[dfa['state'] == 'DE']

Unnamed: 0_level_0,airport,city,state,country,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
33N,Delaware Airpark,Dover,DE,USA,39.218376,-75.596427
DOV,Dover Air Force Base,Dover,DE,USA,39.130113,-75.46631
EVY,Summit Airpark,Middletown,DE,USA,39.520389,-75.720444
GED,Sussex Cty Arpt,Georgetown,DE,USA,38.689194,-75.358889
ILG,New Castle County,Wilmington,DE,USA,39.678722,-75.606528


In [15]:
# We can select rows based on  boolean tests on columns
dfa[(dfa.lat > 48) & (dfa.long < -170)]

Unnamed: 0_level_0,airport,city,state,country,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ADK,Adak,Adak,AK,USA,51.877964,-176.646031
AKA,Atka,Atka,AK,USA,52.220348,-174.20635
GAM,Gambell,Gambell,AK,USA,63.766766,-171.732824
SNP,St. Paul,St. Paul,AK,USA,57.167333,-170.220444
SVA,Savoonga,Savoonga,AK,USA,63.686394,-170.492636


## SPREADSHEET FUNCTIONS

Pandas provides additional functions that can simplify data processing tasks, which often are used in spreadsheets. Of these other functions, the following code cells demonstrate three specific functions:

- __sample__: randomly selects n rows, where n is specified as an argument to the sample function
- __sort_index__: sorts the DataFrame based on the values in the index
- __sort_values__: sorts the DataFrame by the column specified in the by attribute

Note that __the sort functions return a new DataFrame__; to sort a DataFrame in place you must set the inplace attribute to True. In addition, the sort functions take an ascending parameter that specifies if the sort should be in ascending or descending order.


In [16]:
dfa.sample(3)

Unnamed: 0_level_0,airport,city,state,country,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
9S9,Lexington,Lexington,OR,USA,45.452631,-119.688636
POY,Powell Muni,Powell,WY,USA,44.867972,-108.793
AXS,Altus Municipal,Altus,OK,USA,34.698782,-99.3381


In [18]:
dfa.sort_index().head()

Unnamed: 0_level_0,airport,city,state,country,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
00M,Thigpen,Bay Springs,MS,USA,31.953765,-89.234505
00R,Livingston Municipal,Livingston,TX,USA,30.685861,-95.017928
00V,Meadow Lake,Colorado Springs,CO,USA,38.945749,-104.569893
01G,Perry-Warsaw,Perry,NY,USA,42.741347,-78.052081
01J,Hilliard Airpark,Hilliard,FL,USA,30.688012,-81.905944


In [19]:
dfa.head()

Unnamed: 0_level_0,airport,city,state,country,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
00M,Thigpen,Bay Springs,MS,USA,31.953765,-89.234505
00R,Livingston Municipal,Livingston,TX,USA,30.685861,-95.017928
00V,Meadow Lake,Colorado Springs,CO,USA,38.945749,-104.569893
01G,Perry-Warsaw,Perry,NY,USA,42.741347,-78.052081
01J,Hilliard Airpark,Hilliard,FL,USA,30.688012,-81.905944


In [26]:
dfa.sort_values('state',ascending=False).head()

Unnamed: 0_level_0,airport,city,state,country,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CYS,Cheyenne,Cheyenne,WY,USA,41.155722,-104.811838
AFO,Afton Municipal,Afton,WY,USA,42.711246,-110.942164
LSK,Lusk Muni,Lusk,WY,USA,42.753808,-104.404554
PNA,Ralph Wenz,Pinedale,WY,USA,42.795499,-109.807094
WRL,Worland Muni,Worland,WY,USA,43.965713,-107.950831


In [27]:
dfa.head()

Unnamed: 0_level_0,airport,city,state,country,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CYS,Cheyenne,Cheyenne,WY,USA,41.155722,-104.811838
LSK,Lusk Muni,Lusk,WY,USA,42.753808,-104.404554
EVW,Evanston-Uinta County Burns,Evanston,WY,USA,41.274945,-111.032129
BYG,Johnson County,Buffalo,WY,USA,44.381085,-106.72179
RIW,Riverton Regional,Riverton,WY,USA,43.064235,-108.459841


In [29]:
dfa.sort_index(inplace=True)

In [30]:
dfa.head()

Unnamed: 0_level_0,airport,city,state,country,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
00M,Thigpen,Bay Springs,MS,USA,31.953765,-89.234505
00R,Livingston Municipal,Livingston,TX,USA,30.685861,-95.017928
00V,Meadow Lake,Colorado Springs,CO,USA,38.945749,-104.569893
01G,Perry-Warsaw,Perry,NY,USA,42.741347,-78.052081
01J,Hilliard Airpark,Hilliard,FL,USA,30.688012,-81.905944


#######################################

Q.  first extract all airports in the state of California. Second, apply a mask to select only those rows with a latitude between `38` and `40`. Finally, compute and display the average and standard deviation of the longitude for these masked rows.

In [33]:
dfca=dfa[(dfa.state == 'CA') & ((dfa.lat > 38) & (dfa.lat <40))]
dfca.head(2)

Unnamed: 0_level_0,airport,city,state,country,lat,long
iata,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0O3,Calaveras Co-Maury Rasmussen,San Andreas,CA,USA,38.146116,-120.648173
0O4,Corning Municipal,Corning,CA,USA,39.943768,-122.171378


In [36]:
avg = np.average(dfca.long)
sd = np.std(dfca.long)
print("The Average of the longitude is {:6.4f}".format(avg))
print("The standard deviation of the longitude is {:6.4f}".format(sd))

The Average of the longitude is -121.7127
The standard deviation of the longitude is 0.9616
