# JSON and Recursion
Agenda today:
- Introduction to JSON (Javascript Object Notation)
    - Working with JSON in Python
    - Working with a real-world dataset 
- Recursive Function
    - What is recursion?
    - Working with recursion in Python
    - Advantages of recursion



Students will be able to...
- Parse, iterate through a JSON file using the json library, and turn JSON into pandas dataframe
- Explain what a recursive vs. iterative function is

#### Why JSON/XML?
In the past, we have worked with nice, clean tabular table where the the rows are observations and columns are features, such as our housing data. However, when we are working with data collected through the web (through webscraping or an API call), the data are usually messy and in the format of a JSON or XML format. As data scientists, we are expected to know how to turn messy nested dictionary into tabular table for query, cleaning, and modeling. 

### Part I. JSON - Javascript Object Notation

In [None]:
import json
import pandas as pd

In [None]:
ls

#### The JSON library 
- `json.load()` - loads a json file
- `json.loads()` - loads a string

<img src="attachment:Screen%20Shot%202019-05-14%20at%2011.24.59%20AM.png" width="400">

#### Example 1 - Students data

In [None]:
# load this json data using the json library 
student_json = '''
{
  "students": [
    {
      "name": "Matt",
      "major": "Finance",
      "project_grades": ["A","A-","A-","B+","A"]
    },
    {
      "name": "Natalie",
      "major": ["Neuroscience","Psychology"],
      "project_grades": ["A-","B+","A","B","A+"]
    },
    {
      "name": "Remy",
      "major": "Economics",
      "project_grades": ["A-","A-","B+","B+","A"]
    }]
}
'''

In [None]:
# this is highly illegible because it is populated with new line characters. You can use 
# various methods to clean up this data, such as regular expression. However, thanks to the
# help of the json library in python, we can read in the file as json and parse it with ease


In [None]:
# loading this file using the json library and examine it 


In [None]:
# looping through this file 


In [None]:
# examine the first element


In [None]:
# examining the name


In [None]:
# extract information and put them all in a list 
names = []
majors = []
project_grades = []


In [None]:
print(names)
print(majors)
print(project_grades)

In [None]:
# what if not all lines follow the same pattern? Use exception handling!

In [None]:
# turning the above json dictionary into a pandas dataframe
df = pd.DataFrame.from_dict(students['students'])

In [None]:
# expand this list project_grades into its own columns 
# expand df.tags into its own dataframe


In [None]:
# rename the grades df to have more discernible column names


In [None]:
# view the new project grade df 

In [None]:
# rename the columns

# view the grade dataframe


In [None]:
#dropping the column that we don't need

In [None]:
# concat the two datasets

#### Example 2 - States data

In [None]:
# opening the file, reading and writing, context manager

In [None]:
# examine the first element of the states data 


In [None]:
# printing out all the information in the json state data


In [None]:
# printing out all of the information of the data 

In [None]:
# deleting area code because we don't need it 


In [None]:
# saving the results back as a json using the json.dump() method


In [None]:
# turn our new states data with no area code into a dataframe

States Data Credit: [Corey Schafer](https://github.com/CoreyMSchafer/)

#### Example 3. Real data - the Biathlon Data
Data description:
Hans is the coach of the Swedish Women National Biathlon team that is currently training for the upcoming winter season. As he couldn’t travel with his team to Canada to join their training camp, he is facing the problem that some of the team members seem to be cheating on their agreed upon training schedule in order to ensure the athletes improve consistently leading up to the first competition. To track progress on their rifle shooting, the athletes have to write down their name on each target board. This week, Hans’s assistant sent him the scanned reports from Canada but many of them don’t have the names of the athletes on the target boards - now Hans can’t judge the progress of his team! He turns to you for help in building a classifier based on the named reports, that he can use to generate predictions for the reports without names. He keeps some reports with names as test data and, depending on the accuracy of your classifier on the test data, Hans will invite you to the world cup finale this winter. 

Please send back a JSON file with the same format, where each empty name string is replaced with a name of a team member, as well as a jupyter notebook that contains documentation and explanation of your approach.

In [None]:
with open('biathlon_data.json','r') as f:
    file = json.load(f)

In [None]:
file['silhouette_targets'][0]

#### In class exercise:
- Use your knowledge of JSON and parsing JSON data, put the file in legible format
- Think about what is the best way to organize this data to prepare for machine learning tasks such as classification?

## Part II. Working with Recursive function
Recursion is the process of defining something in terms of itself. A recursive function executes by recursively calling itself. 


<img src="attachment:Screen%20Shot%202019-05-14%20at%2012.34.04%20PM.png" width="400">

An example of recursive function: factorial(n)

In [None]:
def calc_factorial(x):
    """This is a recursive function
    to find the factorial of an integer"""

    if x == 1:
        return 1
    else:
        return (x * calc_factorial(x-1))

num = 4
print("The factorial of", num, "is", calc_factorial(num))

#### Comparison between iterative function and recursive function 

In [None]:
#### Iterative function
grades = ["A+","B-","A+","C+","A-","A"]

def print_grades_iteratively():
    for grade in grades:
        print("The student received", grade,"on their homework")

In [None]:
print_grades_iteratively()

In [None]:
#### Recursive function
grades = ["A+","B-","A+","C+"]

# Each function call represents an elf doing his work 
def print_grades_recursively(grades):
    # Worker printer does its work
    if len(grades) == 1:
        grade = grades[0]
        print("The student received",grade)

    # Manager printer doing its work
    else:
        mid = len(grades) // 2
        first_half = grades[:mid]
        second_half = grades[mid:]

        # Divides its work among two printer
        print_grades_recursively(first_half)
        print_grades_recursively(second_half)

### Advantages and Disadvantages of Recursive Function:

#### Advantages: 
- Recursive functions make the code look clean and elegant.
- A complex task can be broken down into simpler sub-problems using recursion.
- Sequence generation is easier with recursion than using some nested iteration.

#### Disadvantages:
- Sometimes the logic behind recursion is hard to follow through.
- Recursive calls are expensive (inefficient) as they take up a lot of memory and time.
- Recursive functions are hard to debug.

Resources:
- [recursion](https://realpython.com/python-thinking-recursively)