# <font color=green> PYTHON FOR DATA SCIENCE
---

## <font color=green> 1. Libraries
---

## 1.1 Installing and importing libraries

The Python language uses the concept of libraries as a set of modules and functions that are useful to the user. They make it easier to reduce the use of code in the project, keeping only what is necessary for the task at hand.

### Installing a library

To install or update a library in Python, we can use `pip`, which is a library manager in Python.

In [None]:
# Installing the matplotlib library with pip
!pip install matplotlib

In [None]:
# Installing a specific version of matplotlib
!pip install matplotlib==3.6.2

There is also PYPI, which is a repository of Python libraries that contains the libraries most used by the community, along with information on how to use them and access to the documentation for each one.

- PYPI ([https://pypi.org/](https://pypi.org/))

### Importing a library

In [9]:
# Importing a library without aliases
import matplotlib

In [None]:
matplotlib.__class__

In [None]:
# Importing a library with aliases
import matplotlib.pyplot as plt

In [14]:
plt.show()

## 1.2 Using packages/libraries

- Python documentation (https://docs.python.org/pt-br/3/)

#### Example 1: Let's test the Matplotlib library for an example of averages of students in a class.

(https://matplotlib.org/stable/tutorials/introductory/pyplot.html)

In [15]:
import matplotlib.pyplot as plt

In [18]:
students = ["John", "Mary", "Joseph"]
grades = [8.5, 9, 6.5]

In [None]:
plt.bar(x=students, height=grades)

####  Example 2: Let's randomly select a student to present their data science work using the Random library

(https://docs.python.org/pt-br/3/library/random.html)

In [20]:
students_two = ["John", "Mary", "Joseph", "Anna"]

In [22]:
# Importing a specific function from a library
from random import choice

<font color=green>**Tip:**</font> You may have noticed throughout our practice how important it is to refer to the documentation to learn how to use a method or package in the Python language.

The `help()` method, for example, returns a description of a variable, method or class.

https://docs.python.org/pt-br/3/library/functions.html?#help

In [None]:
help(choice)

In [None]:
student = choice(students_two)
student

## <font color=green> 2. FUNCTIONS
---

In Python, **functions** are sequences of instructions that perform specific tasks and can be reused in different parts of the code. They can receive input parameters (which we can call *inputs*) and also return results.

## 2.1 Built-in function

The Python interpreter already has a number of built-in functions that can be invoked at any time. Some that we will use throughout this course are: type(), print(), list(), zip(), sum(), map() etc.

***Documentation:***
https://docs.python.org/pt-br/3/library/functions.html


#### **Situation 1:**

The school where we are building our data case shared a student's grade data so that we could calculate his average to one decimal place.

The data received corresponds to a dictionary with keys indicating the quarter in question and the student's grade values for each quarter in a given subject.

In [None]:
# Student's grades
grades = {"1st Quarter": 8.5, "2nd Quarter": 7.5, "3rd Quarter": 9}
grades

In [None]:
# Calculating the sum
addition = 0

for note in grades.values():
    addition += note

addition

In [None]:
# Using the sum() built-in function
summations = sum(grades.values())
summations

In [None]:
# Using the built-in len() function
amount_notes = len(grades)
amount_notes

In [None]:
# calculating the average
average = summations / amount_notes
average

*Round the average using round():*

https://docs.python.org/pt-br/3/library/functions.html#round

In [None]:
average = round(average, 1)
average

## 2.2 Creating functions

Once we've explored built-in functions and learned how to use some of them, you may find yourself needing to solve a specific problem where they won't be enough.

At that point, you'll need to create your own functions, especially if you need to use them in more parts of your code.

### Functions without parameters

#### Standard format:

```python
def <name>():
  <instructions>
```

In [12]:
# Creating a function to calculate the average

def calculate_average():
    calculation = (10 + 9 + 8) / 3
    print(calculation)

In [None]:
calculate_average()

### Functions with parameters

#### Standard format:

```python
def <name>(<param_1>, <param_2>, ..., <param_n>):
  <instructions>
```

In [1]:
def calculate_average(note_one, note_two, note_three):
    calculation = (note_one + note_two + note_three) / 3
    print(calculation)

In [None]:
calculate_average(10, 9, 8)

In [None]:
note_1 = 10
note_2 = 9
note_3 = 8

calculate_average(note_1, note_2, note_3)

#### **Situation 2:**

We have received a request to calculate a student's average from a list and to change the number of grades without preventing the calculation from being redone.

The data received this time corresponds to a list containing only a student's grades in a given subject.

**Let's solve this challenge?**

To make it easier to understand the process, we'll apply it to the grades of just one student, but you can try other cases to practice.

In [4]:
# Student's grades
grades = [8.5, 9.0, 6.0, 10.0]

In [5]:
def calculate_average(notes):
    summations = sum(notes) / len(notes)
    print(summations)

In [None]:
calculate_average(grades)

In [None]:
result = calculate_average(grades)

In [8]:
result

<font color=red>**Attention!**</font>
When we use functions, we need to pay attention to a property called the **scope of a function**.

It determines where a variable can be used within the code. For example, a variable created within a function will only exist within the function. In other words, once the function has finished executing, the variable will not be available to the user in the rest of the code.

## 2.3 Functions that return values

#### Standard format:

```python
def <name>(<param_1>, <param_2>, ..., <param_n>):
  <instructions>
  return result
```

Returning to the previous activity, we can return and save the average value as follows:

In [1]:
# Student's grades
grades = [8.5, 9.0, 6.0, 10.0]

In [2]:
def calculate_average(list):
  calculation = sum(list) / len(list)
  return calculation

In [None]:
calculate_average(grades)

#### **Situation 3:**

We have received a new request, this time to calculate a student's average from a list and return both the average and the student's status (“Passed” if the grade is greater than or equal to 6.0, otherwise it will be “Failed”). 

In addition, we need to display a short text indicating the student's average and status. The data received corresponds to a list containing only a student's grades in a given subject.

**Let's solve this challenge?**

To make it easier to understand the process, we're going to apply the grades of just one student, but you can test other cases to practice.

In [4]:
# Student's grades
grades = [6.0, 7.0, 9.0, 5.0]

In [5]:
def school_report(list):
    average = sum(list) / len(list)
    if  average >= 7:
        situation = "Approved"
    else:
        situation =  "Disapproved"

    return (average, situation)

In [None]:
school_report(grades)

In [7]:
average, situation = school_report(grades)

In [None]:
average

In [None]:
situation

In [None]:
print(f'The student achieved an average of {average} and was {situation}.')

## 2.4 Lambda functions

Also called anonymous functions, these are functions that don't need to be defined, i.e. they don't have a name, and they describe the commands we want to apply in a single line.

https://docs.python.org/pt-br/3/reference/expressions.html?#lambda

#### Default format:

```python
lambda <variable>: <expression>
```

#### **Situation 4:**

In this new request, we need to create a simple calculator for the weighted average of grades in a given subject. We will ask the user to enter the student's 3 grades (N1, N2, N3) and return the student's weighted average. The weights of the grades are 3, 2, 5 respectively.

We need to display a short text indicating the student's average.

**Shall we solve this challenge?**

In [None]:
# Comparing a qualitative function in function format to an anonymous function
grade = float(input("enter the student's grade: "))

def qualitative(x):
    return x + 0.5

qualitative(grade)

In [None]:
# Testing the same function for a lambda function
grade = float(input("enter the student's grade: "))

qualitative = lambda x: x + 0.5
qualitative(grade)

**Moving on to our problem:**

In [None]:
# Receiving the grades and calculating the weighted average

g1 = float(input("Enter student's 1st grade: "))
g2 = float(input("Enter the student's 2nd grade: "))
g3 = float(input("Enter student's 3rd grade: "))

weighted_average = lambda x, y, z: (x * 3 + y * 2 + z * 5) / 10
average_student = weighted_average(g1, g2, g3)
average_student

In [None]:
# Displaying the average
print(f'The student achieved an average of {average_student}!')

### Mapping values

#### Standard format:

```python
map(<lambda function>, <iterator>)
```

#### **Situation 5:**

We've received another request, this time to create a small function that could add a qualitative (extra score) to the quarterly grades of the students in the class that won the programming gymkhana promoted by the school. Each student will receive a qualitative score of 0.5 added to their average.

The data received corresponds to a list containing the grades of some students and a variable with the qualitative score received.

**Shall we solve this challenge?**

To make it easier to understand the process, we'll apply the qualitative to the grades of 5 students, but you can test other cases to practice.

In [3]:
# Student`s grades
grade = [6.0, 7.0, 9.0, 5.5, 8.0]
qualitative = 0.5

In [None]:
""" We can't apply lambda to lists directly, we have to
use the map function with it """

updated_grades = list(map(lambda x: x + qualitative, grade))
updated_grades

## <font color=green> 3. COMPOSITE DATA STRUCTURES
---

## 3.1 Nested structures

We previously learned how to manipulate lists, tuples and dictionaries to work with a sequence or collection of values, whether numeric, categorical, etc. In this lesson, we're going to delve into another common situation for data scientists, which is working with these types of nested structures, i.e. when we have, for example, lists within a list.

### List of lists

#### Standard format:

```python
[[a1, a2,...,an], [b1, b2,...,bn], ..., [n1, n2,...,nn]]
```

#### **Situation 6:**

We received a request to transform a list with the names and grades of three quarters of students into a simple list with the names separated from the grades and a list of lists with the three grades of each student separated from each other. The data received corresponds to a list with the names and respective grades of each student. 

**Let's solve this challenge?**

To make it easier to understand the process, let's work with a fictitious class of 5 students.


In [2]:
class_grades = ["John", 8.0, 9.0, 10.0, "Jason", 9.0, 7.0, 6.0, "Jenny", 3.4, 7.0, 7.0, "Ishaq", 5.5, 6.6, 8.0, "Adrian", 6.0, 10.0, 9.5]

In [8]:
names = []
grouped_grades = []

for i in class_grades:
    if type(i) == str:
        names.append(i)
    else:
        grouped_grades.append(i)

In [None]:
# Displaying the names
names

In [None]:
# Displaying the grades
grouped_grades

In [None]:
# Separating the grades
grades = []

for i in range(0, len(grouped_grades), 3):
    grades.append(grouped_grades[i:i+3])

grades

### List of tuples

#### Standard format:

```python
[(a1, a2,...,an), (b1, b2,...,bn), ..., (n1, n2,...,nn)]
```

#### **Situation 7:**

In this new demand, we need to generate a list of tuples with the students' names and each student's ID code for the data analysis platform. Creating the code consists of concatenating the first letter of the student's name with a random number from 0 to 999. The data received corresponds to a list of each student's name. 

**Let's solve this challenge?**

To make it easier to understand the process, let's work with a fictitious class of 5 students.

In [None]:
# List of students
students = ["Adam", "Laila", "Jerome", "Olly", "Lance"]
students

In [16]:
# Generating a random code
from random import randint

def gen_code():
  return str(randint(0,999))

In [None]:
code_students = []

for i in range(len(students)):
  code_students.append((students[i], students[i][0] + gen_code()))

code_students

## 3.2 List comprehension

It's a simple and concise way of creating a list. We can apply conditionals and loops to create different types of lists based on the patterns we want for our data structure.

https://docs.python.org/pt-br/3/tutorial/datastructures.html?#list-comprehensions


#### Default format:

```python
[expression for item in list]
```

#### **Situation 8:**

We have been asked to create a list with the average grades of the students in the list of lists we created in Situation 6. Remember that each list in the list of lists has the three grades of each student.

**Let's solve this challenge?**

**Tip:** Use the format:
```python
[expression for item in list]
```

In [2]:
# Student's grades
grades = [[8.0, 9.0, 10.0], [9.0, 7.0, 6.0], [3.4, 7.0, 7.0], [5.5, 6.6, 8.0], [6.0, 10.0, 9.5]]

In [3]:
# Calculating the average
def average(list: list=[0]) -> float:
  '''
  Function to calculate the average grade passed by a list

  list: list, default [0]
    List of grades to calculate the average

  return = calculation: float
    Calculated average
  '''

  calculation = sum(list) / len(list)

  return calculation

In [None]:
# Calculating the average of the grades
averages = [round(average(grade), 2) for grade in grades]
averages

#### **Situation 9:**

Now we need to use the averages calculated in the previous example, pairing them with the students' names. This will be necessary to generate a list that selects those students with a final average greater than or equal to 8 to compete for a scholarship for the next academic year. The data received corresponds to a list of tuples with the names and codes of the students and the list of averages calculated above.

**Let's solve this challenge.**

To make it easier to understand the process, let's work with a fictitious class of 5 students.

**Tip:** Use the format:
```python
[expr for item in list if cond]
```


In [1]:
# Defining the tuple with the names and codes_id
names = [('Poppy', 'P987'), ('Zanch', 'Z212'), ('Stacey', 'S973'), ('Jennifer', 'J645'), ('Tiffany', 'T472')]
averages = [9.0, 7.3, 5.8, 6.7, 8.5]

In [None]:
# Generating the list of names (extracting from the tuple)

names = [name[0] for name in names]
names

<font color=green>**Tip:**</font> To be able to pair averages and names easily, we can use another built-in function: `zip()`.

It receives one or more iterables (list, string, dict, etc.) and returns them as a tuple iterator where each element of the iterables is paired.

In [None]:
# Generating the list of students
students = list(zip(names, averages))
students

In [None]:
# Generating the list of people applying for the scholarship
candidate = [student[0] for student in students if student[1] >= 7.0]
candidate

#### **Situation 10:**

We have received two requests regarding this project with students' grades:
- Create a list of the students' situation in which if their average is greater than or equal to 6 they will receive the value “Passed” and otherwise they will receive the value “Failed”.
- Generate a list of lists with:
  - List of tuples with the students' names and their codes
  - List of lists with each student's grades
  - List of averages for each student
  - List of students' status according to averages

The data we will use is the same as that generated in the previous situations (`names`, `grades`, `averages`).

**Let's solve this challenge?**

To follow the process, I'll leave you with the data structures we've already produced.

**Tip**: For the list of situations, use the format:
```python
[result_if if cond else result_else for item in list]
```

In [6]:
names = [("James", "J720"), ("Emma", "M205"), ("Michael", "J371"), ("Charlotte", "C546"), ("Alice", "A347")]
grades = [[8.0, 9.0, 10.0], [9.0, 7.0, 6.0], [3.4, 7.0, 7.0], [5.5, 6.6, 8.0], [6.0, 10.0, 9.5]]
averages = [9.0, 7.3, 5.8, 6.7, 8.5]

In [None]:
situation = ["Approved" if average >= 6.0 else "Disapproved" for average in averages]
situation

**Tip:** To generate the list of lists in the statement we can use the following format
```python
[expr for item in list of lists]
```

In [None]:
registration = [x for x in [names, grades, averages, situation]]
registration

<font color=green>**Tip:**</font> We can resort to the simplest form of list generation with the direct use of square brackets without needing to use expressions and the for loop in list coverage.

In [None]:
complete_list = [names, grades, averages, situation]
complete_list

## 3.3 Dict comprehension

It's a simple and concise way of creating or modifying a dictionary. We can apply conditionals and loops to create different types of dictionaries based on the patterns we want for our data structure and with the support of iterables such as lists or sets.

https://peps.python.org/pep-0274/

#### Standard format:

```python
{chave: value for item in list}
```

#### **Situation 11:**

Our task now consists of generating a dictionary from the list of lists we created in Situation 10 to pass on to the person responsible for building the tables for data analysis.
- The keys of our dictionary will be the columns identifying the type of data
- The values will be the lists with the data corresponding to that key.

**Let's solve this challenge?**

To make it easier to understand the process, let's work with a fictitious class of 5 students.

**Tip:** Use the format

```python
{chave: value for item in list}
```

In [12]:
complete_list = [
    [("James", "J720"), ("Emma", "M205"), ("Michael", "J371"), ("Charlotte", "C546"), ("Alice", "A347")],
    [[8.0, 9.0, 10.0], [9.0, 7.0, 6.0], [3.4, 7.0, 7.0], [5.5, 6.6, 8.0], [6.0, 10.0, 9.5]],
    [9.0, 7.3, 5.8, 6.7, 8.5],
    ['Approved', 'Approved', 'Disapproved', 'Approved', 'Approved']
]

In [None]:
# Columns with data types (except name)
column = ["Grades", "Final average", "Situation"]

registration = {column[i]: complete_list[i+1] for i in range(len(column))}
registration

In [None]:
# Let's finally add the students' names, extracting only their names from the list of tuples
registration ["student"] = [complete_list[0][i][0] for i in range(len(complete_list[0]))]
registration


## <font color=green> 4. DEALING WITH EXCEPTIONS
---

We may have noticed on our way here that there have been some errors and exceptions when executing a command. As a data scientist or programmer, you need to be aware of these situations to avoid bugs or problems in your code and analysis that could affect both the user experience and the efficiency of your analysis.

There are basically two distinct forms of error: syntax errors and exceptions.

Exceptions are errors detected during execution that break the flow of the program and terminate it if they are not dealt with.

We'll learn how to identify and handle some of the exceptions here, but it's always important to dive into the documentation to research and check which ones fit into your projects.

**Documentation on errors and exceptions** https://docs.python.org/3/tutorial/errors.html

## 4.1 Handling Exceptions

Exception handling helps to establish an alternative flow for code execution, preventing processes from being interrupted unexpectedly.

There are a number of exceptions and based on the behavior we want and the errors we want to handle, it is possible to build a path for the user or provide more details about that exception.

- Hierarchy of Exceptions (https://docs.python.org/3/library/exceptions.html#exception-hierarchy)

### Try ... Except

```python
try:
  # code to be executed. If an exception is thrown, stop immediately
except <exception_name as e>:
  # If an exception is thrown in the try, run this code, otherwise skip this step
```

#### **Situation 12:**

You have created a code that reads a dictionary with students' grades and wants to return a list of a student's grades.

If the student is not enrolled in the class, we must handle the exception so that the message “Student not enrolled in the class” appears.

In this example, we will work with the **Key Error** exception, which will interrupt the process of this piece of code.

**Let's test this first treatment?**

In [20]:
grades = {
    "Ella": [8.4, 9.0, 4.9],
    "James": [3.4, 7.3, 7.9],
    "Ethan": [6.6, 7.7, 4.7],
    "Ethan": [3.3, 4.4, 9.3],
    "Amelia": [3.1, 3.8, 8.8],
    "Alexander": [7.5, 6.6, 7.1],
    "Lucas": [7.3, 7.2, 9.2]
}

In [None]:
name = input("Enter the student's name: ")
result = grades[name]
result

In [None]:
try:
    name = input("Enter the student's name: ")
    result = grades[name]
except Exception as error:
    print(type(error), f"Error: {error}")

In [None]:
try:
    name = input("Enter the student's name: ")
    result = grades[name]
except KeyError:
    print(f"Student not registered: {name}")

### Adding Else

```python
try:
  # code to be executed. If an exception is thrown, stop immediately
except:
  # If an exception is thrown in the try, run this code, otherwise skip this step
else:
  # If there is no exception thrown by the try, run this part
```

#### **Situation 13:**

You have created a code that reads a dictionary with students' grades and wants to return a list of a student's grades.

If the student is not enrolled in the class, we should handle the exception so that the message “Student not enrolled in the class” appears and if the exception is not thrown, we should display the list with the student's grades.

In this example, we will work with the **Key Error** exception, which will interrupt the process of this piece of code.

**Let's test this treatment?**

In [1]:
grades = {
    "Ella": [8.4, 9.0, 4.9],
    "James": [3.4, 7.3, 7.9],
    "Ethan": [6.6, 7.7, 4.7],
    "Ethan": [3.3, 4.4, 9.3],
    "Amelia": [3.1, 3.8, 8.8],
    "Alexander": [7.5, 6.6, 7.1],
    "Lucas": [7.3, 7.2, 9.2]
}

In [None]:
try:
    name = input("Enter the student's name: ")
    result = grades[name]
except KeyError:
    print(f"Student not registered: {name}")
else:
    print(f"{name}'s student curriculum: {result}")

### Adding the finally

```python
try:
  # code to be executed. If an exception is thrown, stop immediately
except:
  # If an exception is thrown in the try, run this code, otherwise skip this step
else:
  # If no exception is thrown by the try, run this part
finally:
  # Run this part (with or without exception)

#### **Situation 14:**

You have created a code that reads a dictionary with students' grades and wants to return a list of a student's grades.

If the student is not enrolled in the class, we should handle the exception so that the message “Student not enrolled in the class” appears and if the exception is not thrown, we should display the list with the student's grades. A text warning that “The query has been closed!” should be displayed with or without the exception being thrown.

In this example, we will work with the **Key Error** exception, which will interrupt the process of this piece of code.

**Let's test this treatment?**

In [None]:
grades = {
    "Ella": [8.4, 9.0, 4.9],
    "James": [3.4, 7.3, 7.9],
    "Ethan": [6.6, 7.7, 4.7],
    "Ethan": [3.3, 4.4, 9.3],
    "Amelia": [3.1, 3.8, 8.8],
    "Alexander": [7.5, 6.6, 7.1],
    "Lucas": [7.3, 7.2, 9.2]
}

In [None]:
try:
    name = input("Enter the student's name: ")
    result = grades[name]
except KeyError:
    print(f"Student not registered: {name}")
else:
    print(f"{name}'s student curriculum: {result}")
finally:
    print("End of the search!")

## 4.2 Raise

Another way of working with exceptions in your code is to create your own exceptions for certain behaviors you want in your code.

To do this, you use the keyword `raise` along with the type of exception you want to throw and a message to display.

```python
raise ErrorName(“desired_message”)
```

#### **Situation 15:**

You have created a function to calculate a student's average in a given subject by passing the student's grades in a list.

You want to deal with 2 situations:
- If the list contains a non-numeric value, the average calculation will not be performed and a message “It was not possible to calculate the student's average. Only numeric values are accepted!” will be displayed.
- If the list has more than 4 grades, a **ValueError** exception will be thrown stating that “The list cannot have more than 4 grades.”

A text warning that “The query has been closed!” must be displayed with or without the exception being thrown.

**Let's solve this challenge?**

In [13]:
def average(list: list=[0]) -> float:
  '''
  Function to calculate the average grade passed by a list

  list: list, default [0]
    List of grades to calculate the average

  return = calculation: float
    Calculated average
  '''

  calculation = sum(list) / len(list)

  if len(list) > 4:
    raise ValueError("The maximum number of grades is 4")

  return calculation

In [None]:
# Simulating the function with a list of grades with 4 values
grades = [6, 7, 8, 9]
result= average(grades)
result

In [None]:
# Simulating the function with a list of grades with 5 values, the raise is called
grades = [6, 7, 8, 9, 10]
result= average(grades)
result

In [None]:
# Simulating correct operation
try:
    grades = [6, 7, 8, 9]
    result= average(grades)
except TypeError:
    print("It was not possible to calculate the student's average. Only numeric values are accepted!")
except ValueError as error:
    print(error)
else:
    print(result)
finally:
    print("End of the search!")

In [None]:
# Simulating the error with more than 4 values in grades
try:
    grades = [6, 7, 8, 9, 10]
    result= average(grades)
except TypeError:
    print("It was not possible to calculate the student's average. Only numeric values are accepted!")
except ValueError as error:
    print(error)
else:
    print(result)
finally:
    print("End of the search!")