# Computational Module 2.2: Intermediate Python II

Please complete this notebook by filling in the cells provided.

For all problems that you must write our explanations and sentences for, you must provide your answer in the designated space. Moreover, throughout this homework and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use <code>max_temperature</code> in your answer to one question, do not reassign it later on.

Directly sharing answers with your fellow ULAB colleagues is not okay, but discussing problems with your mentors or them is encouraged. You should start early so that you have time to get help if you're stuck! Drop-in office hours will staffed by ULAB computational scientists will be periodically held; please keep an eye on ULAB Slack for more information.

## Lesson Plan

In this lesson, you will...
- learn a bit more about dictionaries
- apply your new knowledge to a real data set!

In [None]:
#[[IMPORTANT]] change to your FULL name and run this cell! Otherwise grading won't work!
%env grade_name='Name'

In [None]:
# MAKE SURE TO RUN THIS CELL (It imports the autograder file)
!pip install slacker
import csModule2

### Dictionary-specific Methods

We went over the definition of dictionaries in Python and basic actions that can be performed on them i.e. adding, removing, or altering key-value pairs, but in this subsection, we will introduce 3 methods that are often useful when attempting to enforce conditions upon keys and values and return pairs that fulfill these conditions: <code>keys()</code>, <code>values()</code>, and <code>items()</code>.

As we learned in Module 1, a dictionary in Python looks like the following:

In [5]:
random_dict_1 = {"stars": 4, "planets": 4, 9: [45e2, 345e4, 3544]}

Let's review! Can you access the value of a certain key in <code>random_dict_1</code>? If you have trouble with this, please review the <b>Dictionaries</b> section from Module 1!

In [None]:
random_dict_1["stars"] # insert some key you choose in the ...

Run the cell below.

In [None]:
random_dict_1["ULAB is the greatest thing since sliced bread!"] # This key is probably not in your dictionary

As you can see from the erroring cell above, if you try to access a key that doesn't exist, the dictionary will return a "KeyError."

On another note, dictionaries are <i>mutable</i> data types, which means that you can change its elements. Want to...

... change a value that corresponds to a key $k$ to $v$?
<code>
dict[k] = v
</code>

... add a new key-value pair ($k, v$) to your dictionary?
<code>
dict[k] = v
</code>

... remove a key-value pair ($k, v$) from your dictionary?
<code>
dict.pop(k, None)
</code>

Let's continue working with <code>solar_system</code>. What do the below code cells do?

In [8]:
solar_system = {"planets": ["Mercury", "Venus", "Earth", "Mars", "Jupyter ;)", "Saturn", "Uranus", "Neptune"], 
               "dwarf planets": ["Pluto"],
               "Earth (distance from sun in AUs)": 1,
               "Sun": "main-sequence"}

In [None]:
solar_system.keys() # Line 1

In [None]:
solar_system.values() # Line 2

In [None]:
solar_system.items() # Line 3

Line 1 displays a <code>dict_keys</code> list of all the keys in the dictionary <code>solar_system</code>. Line 2 returns a <code>dict_values</code> list of all the values in the dictionary <code>solar_system</code>. Line 3 is a bit more complex in that it returns a list of pairs (specifically the Python data type <i>tuple</i>). Remember that elements of a tuple can be accessed in the same way that list elements are accessed using the format <code>tuple_ex[index]</code>. From Module 1, we know that a tuple is essentially an immutable list. You cannot edit the elements of a tuple without encountering errors. For example:

In [None]:
# what does this line do?
tuple_ex = (4, 5, "cats", "b", 45.0)
tuple_ex[2]

In [None]:
# complete this line to access the 5th element in the tuple!
tuple_ex[...]

Now, we know that line 3 returns a set of tuples in which the first element of each tuple $t$ is the key and the second is the value.

#### Checkpoint 9

What if you want to know if a certain value (in this case, "main-sequence") is in the dictionary? Run the following block of code. Does it do what you want it to do? How can you modify it so it answers the question?

In [None]:
for tup in solar_system.items():
    if tup[1] == "main-sequence":
        print("It's here!")

In [None]:
# no autograder test for this checkpoint!

Note that because the <code>.keys()</code>, <code>.values()</code>, <code>.items()</code> methods return list-<i>like</i> objects, you can iterate through them in the same way that you do lists.

#### Checkpoint 10

In [None]:
# a trick
# what does this line do?
"main-sequence" in solar_system.values()

In [None]:
# no autograder test for this checkpoint!

As can be seen from the above code cell, you don't always need to iterate (although sometimes it can be necessary) through a list (or list-like object) to determine whether or not it contains an item.

## Exercises

For the following questions, we will be examining a database of known exoplanets from <a href = "exoplanets.org">this link</a>. 

In [13]:
# Don't worry about the code in the cells until the exercises begin. We'll explain what's going on in CS Module #3!
import pandas as pd, math
planets_data = pd.read_csv("exoplanets.csv")
planets_data.columns = ["name", "mass(mjupiter)", "semimajor_axis(AU)", "orbital_period(days)", 
                        "orbital_eccentricity", "omega(deg)", "time_of_periastron(jd)", 
                        "velocity_semiamplitude(m/s)", "orbit_reference", "orbit_url", "first_reference", "first_url"]
planets_data.drop(planets_data.index[0], inplace = True)
planets_data.head()

Unnamed: 0,name,mass(mjupiter),semimajor_axis(AU),orbital_period(days),orbital_eccentricity,omega(deg),time_of_periastron(jd),velocity_semiamplitude(m/s),orbit_reference,orbit_url,first_reference,first_url
1,Kepler-107 d,,0.0780099,7.958203,,90,2454970.79968,,Rowe 2014,http://adsabs.harvard.edu/abs/2014arXiv1402.6534R,Rowe 2014,http://adsabs.harvard.edu/abs/2014arXiv1402.6534R
2,Kepler-1049 b,,0.0344721,3.27346074,0.0,90,,,Morton 2016,http://adsabs.harvard.edu/abs/2016ApJ...822...86M,Morton 2016,http://adsabs.harvard.edu/abs/2016ApJ...822...86M
3,Kepler-813 b,,0.13761,19.12947337,0.0,90,,,Morton 2016,http://adsabs.harvard.edu/abs/2016ApJ...822...86M,Morton 2016,http://adsabs.harvard.edu/abs/2016ApJ...822...86M
4,Kepler-427 b,0.310432,0.091351,10.290994,0.0,90,2454970.02207,29.8,Hebrard 2014,http://adsabs.harvard.edu/abs/2014A%26A...572A...,Borucki 2010,http://adsabs.harvard.edu/abs/2010Sci...327..977B
5,Kepler-1056 b,,0.185149,27.495606,0.0,90,,,Morton 2016,http://adsabs.harvard.edu/abs/2016ApJ...822...86M,Morton 2016,http://adsabs.harvard.edu/abs/2016ApJ...822...86M


In [50]:
# this cell converts the columns of the table we're looking at into lists we can use!
orbital_ecc = list(planets_data["orbital_eccentricity"])
orbital_per = list(planets_data["orbital_period(days)"])
semimajor_axis = list(planets_data["semimajor_axis(AU)"])
names = list(planets_data["name"])
orbit_references = list(planets_data["orbit_reference"])
planet_masses = list(planets_data["mass(mjupiter)"])

1) You want to find all of the planets with nonzero eccentricities. Write a block of code that outputs the <i>indices</i> of these planets in a list format.

You are given the list <code>orbital_ecc</code> which holds various numeric values (mostly in float form, some 0's, and some <code>nan</code>). NaN stands for not a number, and this essentially means that we don't have data regarding the eccentricity of that particular planet. You cannot include NaN indices in your final list of nonzero eccentricities. For this purpose, we have imported the Python math library for you, and you can use the <code>math.isnan(param)</code> function here to check if a certain value is NaN.

In [55]:
import numpy as np
def nonzero_ecc():
    # insert your answer for exercise 1 here
    planet_indices = []
    for i in np.arange(len(orbital_ecc)):
        orb = orbital_ecc[i]
        if orb != 0.0 and not math.isnan(orb):
            planet_indices += [i]
    print(planet_indices)
    return planet_indices # you should return a list

In [None]:
# autograder cell: do not alter
csModule2.exercise1(nonzero_ecc)

2) Do any planets have a semimajor axis $\geq$ 1 AU? Write a block of code that outputs the names of these planets in a list if there are any that fulfill this requirement. Note that the list <code>semimajor_axis</code> contains all of the data for the magnitude in AU and <code>names</code> contains all of the names of the planets. Both lists may contain NaN objects. Also, all of the data in the list <code>semimajor_axis</code> are in string format, so you can use the <code>float(param)</code> method to convert to a float. 

<i>Hint</i>: How can you build on top of your answer from the previous problem to find the <i>names</i> of the satisfying planets?

In [None]:
def semi_axis_1():
    # insert your answer for exercise 2 here
    ...
    planet_names = [...]
    ...
    return planet_names # you should return a list

In [None]:
# autograder cell: do not alter
csModule2.exercise2(semi_axis_1)

3) You want to find the first planet with a mass less than 0.03 Jupiter masses. You may use the <code>planet_masses</code> and <code>names</code> lists and keep in mind that some values in the lists may be NaN or <code>None</code>.

In [None]:
def less_003():
    # insert your answer for exercise 3 here
    ...
    planet_name = ...
    return planet_name # you should return a string with the name of the planet that has a mass less than 0.03Jmasses

In [None]:
# autograder cell: do not alter
csModule2.exercise3(less_003)

4) What is the average orbital period of the planets in this database? Remember: average only numerical values. Do not count NaN or None values when dividing by the total number of planets because they are not contributing useful data points. You may use the <code>orbital_per</code> list.

In [None]:
import numpy as np
orb_per_array = np.asarray(orbital_per)
orbit_nums = [k for k in orb_per_array if '.' in k]
#print(len(orbit_nums))

def avg_orb_per():
    # insert your answer for exercise 4 here
    summ = 0
    length = 0
    not_nan = []
    for i in range(len(orbital_per)):
        orb = float(orbital_per[i])
        if not math.isnan(orb) and orb != None:
            summ += orb
            length += 1
            not_nan += [orb]
    avg = summ / length
    print(length, len(not_nan))
    return avg, np.mean(not_nan) # you should return a numeric value

avg_orb_per()

In [None]:
# autograder cell: do not alter
csModule2.exercise4(avg_orb_per)

5) <b>Challenge</b>: Who is the most common author of the papers that serve as orbit references for the exoplanets? You may use the <code>orbit_references</code> list to conduct your analysis. Remember to take into account that a planet may not have an orbit reference (in which case the default value would be <code>None</code>)!

<i>Hint</i>: You can iterate through the elements of a dictionary by using the method <code>dict_name.items()</code> to create a list-like object with each of the pairs from the dictionary as an element.

In [None]:
def most_common_author():
    # insert your answer for exercise 5 here
    ...
    author = ...
    return author # you should return a value of type string

In [None]:
# autograder cell: do not alter
csModule2.exercise5(most_common_author)

In [None]:
# Run all tests at once! You can screenshot the output of this cell and submit to the lab manager.
csModule2.test_all(find_max_sum, count_match, partial_factorial, final_list, final_str, new_final_str,
                  flatten, original_lst, nonzero_ecc, semi_axis_1, less_003, avg_orb_per)

## Summary

In this lesson we learned...
- what functions, lambdas, and nested functions are, and how to create them
- more about Python data structures, specifically lists, strings, and dictionaries, and intricacies about formatting them or type-specific methods
- how to construct lists concisely using list comprehensions
- how to use the <code>map</code>, <code>filter</code>, and <code>reduce</code> functions to avoid running too many / nested for loops