## Part 1: A quick tour of Jupyter notebooks

In [45]:
# Basic "unit" is a cell

Cells can be code or text (markdown)

In [46]:
# How do we execute a line of code?

# TODO: Write some code to compute the value of 3+3
3 + 3

6

In [47]:
# We can store variables

# TODO: Make a variable called myDog and assign it the value "Laika"
myDog = "Laika"

In [48]:
# We can view the variable directly, or print it out

In [49]:
# We can also write functions and execute them

# TODO: Write a function that accepts a string and then does something, anything, with it

def isDogAwesome(dog):
    if dog == "Laika":
        print(dog + " is so rad!")
    else:
        print("Maybe? I'd love to meet " + dog)

In [50]:
# TODO: Write code that calls the function you just wrote and passes in the variable myDog
isDogAwesome(myDog)

Laika is so rad!


In [51]:
# Remember...
# Order matters!
# You can execute a single cell, or run them all...
# You can also MOVE cells around... or insert cells anywhere
# And you can split cells...or merge them
# It can all get confusing... if you need a reset, you can always restart your Kernel

In [52]:
# And we can import packages
# This can be packages we write ourselves, i.e. a .py file
# Or it can be packages we have installed on our computer (a bunch came with Anaconda)
# We'll be using numpy and pandas to start

# TODO: Write the code you need to import Numpy and Pandas
# Note: np and pd are shorthand, and they are conventions (you could call them whatever you want)
import numpy as np
import pandas as pd


## Part 2: Let's get some data!

### Let's import some data!
We'll be using data about the **cost of launching stuff into space** over time. It comes from [Our World in Data](https://ourworldindata.org/grapher/cost-space-launches-low-earth-orbit).

Go check out this link (always always seek out the source!). What do you notice?

In [53]:
# Let's load in the data, which is in the form of a CSV file

# TODO: Write a line of code to import "cost-space-lauches.csv" and save it as a DataFrame called df_launch
# Note: Where is your CSV file, and do you have the correct file path?
df_launch = pd.read_csv("cost-space-launches.csv")

In [54]:
# Now let's look at it
# What do you notice?
df_launch

Unnamed: 0,Entity,Code,Year,cost_per_kg,launch_class
0,Angara,,2014,4500,Heavy
1,Antares,,2013,13600,Medium
2,Ariane 44,,1988,18300,Medium
3,Ariane 5G,,1997,10200,Heavy
4,Athena 1,,1997,19200,Small
...,...,...,...,...,...
56,Titan III+,,1965,21000,Medium
57,Titan IV,,1989,30800,Heavy
58,Vega,,2012,20000,Small
59,Zenit 2,,1985,5100,Medium


In [55]:
# How many columns? What are they?
# How many rows? What does each row represent?

# TODO: Write a line of code that displays the header
# TODO: Write a line of code that displays the first 10 rows

In [56]:
df_launch.head(10)

Unnamed: 0,Entity,Code,Year,cost_per_kg,launch_class
0,Angara,,2014,4500,Heavy
1,Antares,,2013,13600,Medium
2,Ariane 44,,1988,18300,Medium
3,Ariane 5G,,1997,10200,Heavy
4,Athena 1,,1997,19200,Small
5,Atlas Centaur,,1963,29500,Medium
6,Atlas II,,1991,18700,Medium
7,Atlas III,,2000,16000,Medium
8,Atlas V,,2002,8100,Medium
9,Delta 3000-Series,,1975,21400,Small


In [57]:
# Let's get a summary
df_launch.describe()

Unnamed: 0,Code,Year,cost_per_kg
count,0.0,61.0,61.0
mean,,1993.262295,23278.688525
std,,16.615155,27843.174596
min,,1961.0,1500.0
25%,,1985.0,8900.0
50%,,1996.0,16000.0
75%,,2003.0,29500.0
max,,2019.0,177900.0


In [58]:
# How do we select a single column?
df_launch["Year"]

0     2014
1     2013
2     1988
3     1997
4     1997
      ... 
56    1965
57    1989
58    2012
59    1985
60    1999
Name: Year, Length: 61, dtype: int64

In [59]:
# How do we select a single row?
df_launch.iloc[2]

Entity          Ariane 44
Code                  NaN
Year                 1988
cost_per_kg         18300
launch_class       Medium
Name: 2, dtype: object

In [60]:
# And how do we select value within the Year column?
df_launch["Year"][0]

2014

In [61]:
# TODO: Write a line of code to get the cost_per_kg colum
df_launch["cost_per_kg"]

0      4500
1     13600
2     18300
3     10200
4     19200
      ...  
56    21000
57    30800
58    20000
59     5100
60     8900
Name: cost_per_kg, Length: 61, dtype: int64

In [62]:
# TODO: Write a line of code to get the 5th row
df_launch.iloc[5]

Entity          Atlas Centaur
Code                      NaN
Year                     1963
cost_per_kg             29500
launch_class           Medium
Name: 5, dtype: object

In [63]:
# TODO: Write some code to get the launch_class of the 3rd row
df_launch["launch_class"][3]

'Heavy'

In [64]:
# Now, what TYPES are they?
type(df_launch["Year"])

pandas.core.series.Series

In [65]:
type(df_launch.iloc[3])

pandas.core.series.Series

In [66]:
type(df_launch["Year"][0])

numpy.int64

In [67]:
# Series are 1-dimensional arrays (basically lists)
# Invidiaul values (cells) can be any data type (strings, integers, floating points, even lists)

In [68]:
# TODO: What (Python) types are the values in the other columns? 
# We can iterate over the columns in a dataframe!
for column in df_launch:
    print(column)
    print(type(df_launch[column][0]))

Entity
<class 'str'>
Code
<class 'numpy.float64'>
Year
<class 'numpy.int64'>
cost_per_kg
<class 'numpy.int64'>
launch_class
<class 'str'>


In [81]:
# There's also an easy-peasy built in way with pandas using .dtypes...
df_launch.dtypes

Entity           object
Code            float64
Year              int64
cost_per_kg       int64
launch_class     object
dtype: object

In [82]:
# Are you loving pandas yet??
# Another built in trick pandas has is .info()... let's try it
df_launch.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 61 entries, 0 to 60
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Entity        61 non-null     object 
 1   Code          0 non-null      float64
 2   Year          61 non-null     int64  
 3   cost_per_kg   61 non-null     int64  
 4   launch_class  61 non-null     object 
dtypes: float64(1), int64(2), object(2)
memory usage: 2.5+ KB


In [69]:
# What's going on with that "Code" column?
# It has NaN (Not a Number) as its values -- this means missing data!
# We have some tricks for dealing with this...

In [70]:
# TODO: Make a new data frame that just has the columns we want (no Code column)
# We can select a specific subset of columns in a dataframe using a list of column names
df_new = df_launch[["Entity","Year","cost_per_kg","launch_class"]]

In [71]:
df_new

Unnamed: 0,Entity,Year,cost_per_kg,launch_class
0,Angara,2014,4500,Heavy
1,Antares,2013,13600,Medium
2,Ariane 44,1988,18300,Medium
3,Ariane 5G,1997,10200,Heavy
4,Athena 1,1997,19200,Small
...,...,...,...,...
56,Titan III+,1965,21000,Medium
57,Titan IV,1989,30800,Heavy
58,Vega,2012,20000,Small
59,Zenit 2,1985,5100,Medium


In [72]:
# How do we count the values of the launch class?
df_new["launch_class"].value_counts()

Medium    29
Small     23
Heavy      9
Name: launch_class, dtype: int64

In [73]:
# How would be convert cost per kilogram to cost per pound?
# TODO: Let's make a new column and do that...
# Remember, 1 kg = 2.2 pounds (lbs)

In [74]:
df_new["cost_per_lb"] = df_new["cost_per_kg"]/2.2

In [75]:
df_new

Unnamed: 0,Entity,Year,cost_per_kg,launch_class,cost_per_lb
0,Angara,2014,4500,Heavy,2045.454545
1,Antares,2013,13600,Medium,6181.818182
2,Ariane 44,1988,18300,Medium,8318.181818
3,Ariane 5G,1997,10200,Heavy,4636.363636
4,Athena 1,1997,19200,Small,8727.272727
...,...,...,...,...,...
56,Titan III+,1965,21000,Medium,9545.454545
57,Titan IV,1989,30800,Heavy,14000.000000
58,Vega,2012,20000,Small,9090.909091
59,Zenit 2,1985,5100,Medium,2318.181818


In [76]:
# TODO: How do we sort to see what mission cost the most (per weight)? The least?
df_new.sort_values(by=['cost_per_kg'], ascending=False)

Unnamed: 0,Entity,Year,cost_per_kg,launch_class,cost_per_lb
11,Delta E,1965,177900,Medium,80863.636364
46,Scout,1961,118500,Small,53863.636364
37,Minotaur I,2000,73100,Small,33227.272727
51,Space Shuttle,1981,65400,Heavy,29727.272727
41,Pegasus XL,1996,50600,Small,23000.000000
...,...,...,...,...,...
45,Saturn V,1967,5400,Heavy,2454.545455
59,Zenit 2,1985,5100,Medium,2318.181818
0,Angara,2014,4500,Heavy,2045.454545
20,Falcon 9,2010,2600,Medium,1181.818182


In [77]:
# Notice that this doesn't change the dataframe...
df_new

Unnamed: 0,Entity,Year,cost_per_kg,launch_class,cost_per_lb
0,Angara,2014,4500,Heavy,2045.454545
1,Antares,2013,13600,Medium,6181.818182
2,Ariane 44,1988,18300,Medium,8318.181818
3,Ariane 5G,1997,10200,Heavy,4636.363636
4,Athena 1,1997,19200,Small,8727.272727
...,...,...,...,...,...
56,Titan III+,1965,21000,Medium,9545.454545
57,Titan IV,1989,30800,Heavy,14000.000000
58,Vega,2012,20000,Small,9090.909091
59,Zenit 2,1985,5100,Medium,2318.181818


In [78]:
# We can save the sorted dataframe as itself, or as a new one
df_new = df_new.sort_values(by=['cost_per_kg'], ascending=False)

In [79]:
df_new

Unnamed: 0,Entity,Year,cost_per_kg,launch_class,cost_per_lb
11,Delta E,1965,177900,Medium,80863.636364
46,Scout,1961,118500,Small,53863.636364
37,Minotaur I,2000,73100,Small,33227.272727
51,Space Shuttle,1981,65400,Heavy,29727.272727
41,Pegasus XL,1996,50600,Small,23000.000000
...,...,...,...,...,...
45,Saturn V,1967,5400,Heavy,2454.545455
59,Zenit 2,1985,5100,Medium,2318.181818
0,Angara,2014,4500,Heavy,2045.454545
20,Falcon 9,2010,2600,Medium,1181.818182


### CHALLENGE (if you finish early): 
Go to [Our World in Data](https://ourworldindata.org/) and find a data set that interests you.

Download the CSV file.

Load it into this notebook.

(1) What do you notice about how the data is organized?

(2) Can you sort the data in an interesting day?

(3) Can you calculate a new column that is useful?

In [80]:
# TODO: Your code here....