# Google Colab and Python Basics

### Getting Started

The document you are reading is not a static web page, but an interactive environment called a **Colab notebook** that lets you write and execute code.

For example, here is a **code cell** with a short Python script that computes a value, stores it in a variable, and prints the result:



In [1]:
seconds_in_a_day = 24 * 60 * 60
seconds_in_a_day

86400

To execute the code in the above cell, select it with a click and then either press the play button to the left of the code, or use the keyboard shortcut "Command/Ctrl+Enter". To edit the code, just click the cell and start editing.
Variables that you define in one cell can later be used in other cells:

In [2]:
seconds_in_a_week = 7 * seconds_in_a_day
seconds_in_a_week

604800

Colab notebooks allow you to combine **executable code** and **rich text** in a single document, along with **images**, **HTML**, **LaTeX** and more. When you create your own Colab notebooks, they are stored in your Google Drive account. You can easily share your Colab notebooks with co-workers or friends, allowing them to comment on your notebooks or even edit them.
To learn more, see [Overview of Colab](/notebooks/basic_features_overview.ipynb).

To create a new Colab notebook you can use the File menu above, or use the following link: [create a new Colab notebook](http://colab.research.google.com#create=true).

Colab notebooks are Jupyter notebooks that are hosted by Colab.

To learn more about the Jupyter project, see [jupyter.org](https://www.jupyter.org).

You can import your own data into Colab notebooks from your Google Drive account, including from spreadsheets, as well as from Github and many other sources.

In [None]:
import numpy as np
import IPython.display as display
from matplotlib import pyplot as plt
import io
import base64

ys = 200 + np.random.randn(100)
x = [x for x in range(len(ys))]

fig = plt.figure(figsize=(4, 3), facecolor='w')
plt.plot(x, ys, '-')
plt.fill_between(x, ys, 195, where=(ys > 195), facecolor='g', alpha=0.6)
plt.title("Sample Visualization", fontsize=10)

data = io.BytesIO()
plt.savefig(data)
image = F"data:image/png;base64,{base64.b64encode(data.getvalue()).decode()}"
alt = "Sample Visualization"
display.display(display.Markdown(F"""![{alt}]({image})"""))
plt.close(fig)

This code is a Python script that generates a plot using Matplotlib, encodes the plot as a base64 image, and then displays it in a Jupyter notebook using Markdown.

Let's see what each part of the code does:

**1. Importing Libraries**

In [None]:
import numpy as np
import IPython.display as display
from matplotlib import pyplot as plt
import io
import base64


In Python, you import libraries using the `import` statement. This allows you to use the functions and classes provided by the library in your code.

- Basic Syntax
```python
import library_name
```

**Example**
```python
import numpy as np
```
This imports the `numpy` library and allows you to use it with the alias `np`.

-  Importing Specific Functions or Classes
```python
from library_name import specific_function
```

**Example**
```python
from math import sqrt
```
This imports only the `sqrt` function from the `math` library.

-  Importing with an Alias
```python
import library_name as alias
```

**Example**
```python
import pandas as pd
```
This imports the `pandas` library and allows you to use it with the alias `pd`.

This makes your code cleaner and easier to manage, especially with larger libraries.

--------

### 2. Generating Data

After loading libraries, we generate data:

In [None]:
ys = 200 + np.random.randn(100)
x = [x for x in range(len(ys))]


### Loading data

We can also use data from external files and in this case we typically use specialized libraries depending on the file format.

For example, to load `.csv` files we use the `pandas` library again and we will use it a lot for most data wrangling and preprocessing.

*Note:*

`Pandas` is a powerful and flexible open-source data analysis and manipulation library for Python. It provides data structures and functions needed to work with structured data seamlessly, making it a go-to tool for data analysis and ML projects.

To load a .csv file that is in the same folder as the notebook file we use

```python
import pandas as pd

data = pd.read_csv('file.csv')
```

`pd.read_csv('file.csv')`: Reads the CSV file and loads it into a pandas DataFrame.

We can also use `pandas` to load an Excel file

```python
import pandas as pd

data = pd.read_excel('file.xlsx')
```

To load a plain text files, use Python's built-in `open()` function.

```python
with open('file.txt', 'r') as file:
    data = file.read()
```
`file.read()`: Reads the entire content of the file as a string.
We will talk about data types in Python in a minute!

If the file is stored on your local machine, you can upload it to Colab first:

```python
from google.colab import files
uploaded = files.upload()

import pandas as pd
data = pd.read_csv('file.csv')
``


`

### Task

*Music & Mental Health Survey Results survey*

Background:

Music therapy, or MT, is the use of music to improve an individual's stress, mood, and overall mental health. MT is also recognized as an evidence-based practice, using music as a catalyst for "happy" hormones such as oxytocin.

However, MT employs a wide range of different genres, varying from one organization to the next.

The MxMH dataset aims to identify what, if any, correlations exist between an individual's music taste and their self-reported mental health. Ideally, these findings could contribute to a more informed application of MT or simply provide interesting sights about the mind.

1. Review the dataset [here](https://www.kaggle.com/datasets/catherinerasgaitis/mxmh-survey-results)

2. Download the file with the data from [here](https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fqr0dOvSort5I2CoftsCl%2Fuploads%2FncLuwVPkCkSRyB3Mzlmj%2Fmxmh_survey_results.csv?alt=media&token=a0c509e8-f74f-41ef-b9d3-cc8188f7df61)

File->Save_as to your local computer to review it locally.

3. Create a new Jupyter notebook, load the required libraries (e.g. pandas) and load the dataset using the link from 2.

In [4]:
# Solution
import pandas as pd

survey=pd.read_csv("https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fqr0dOvSort5I2CoftsCl%2Fuploads%2FncLuwVPkCkSRyB3Mzlmj%2Fmxmh_survey_results.csv?alt=media&token=a0c509e8-f74f-41ef-b9d3-cc8188f7df61")

## Data types

In Python, data can be stored in different types of variables.

Common data types include

**Integers**: Whole numbers (e.g., 10)
**Floats**: Numbers with decimals (e.g., 20.5)
**Strings**: Sequences of characters (e.g., "Hello, World!")
**Lists**: Ordered collections of items (e.g., [1, 2, 3])
**Tuples**: Immutable ordered collections (e.g., (6, 7, 8))
**Dictionaries**: Collections of key-value pairs (e.g., {"name": "Alice", "age": 30})

Some "new" data types that you may not be familiar are lists, tuples and dictionaries.

Let's have a close look at them

**Lists**

*Definition*: Lists are ordered collections of items that can be of different types. Lists are mutable, meaning they can be changed after creation.

*Examples*: `[1, 2, 3], ["apple", "banana", "cherry"], [1, "hello", 3.14]`

*Use*: Lists are used for storing collections of items where the order matters and where you might need to modify the collection.

**Creating a list**
```python
my_list = [1, 2, 3, 4, 5]
print("Original List:", my_list)
```

**Modifying the list**
```python
my_list.append(6)  # Adding an element to the end of the list
print("List after appending 6:", my_list)

my_list[0] = 10  # Changing the first element
print("List after changing the first element:", my_list)

# Removing an element from the list
my_list.remove(3)
print("List after removing 3:", my_list)
```

**Tuples**

*Definition*: Tuples are ordered collections of items similar to lists, but they are immutable, meaning they cannot be changed after creation.

*Examples*: (1, 2, 3), ("apple", "banana", "cherry"), (1, "hello", 3.14)

*Use*: Tuples are used for storing collections of items where the order matters, and you want to ensure the data remains constant.


**Task**:
Create variables of different data types.
Print the variables and their types.
Perform basic operations on these variables.
Print the results of the operations.

### Functions

Functions in Python are blocks of reusable code that perform a specific task. They help organize your code, make it more readable, and avoid repetition.

1. **Defining a Function**
To define a function in Python, use the `def` keyword, followed by the function name, parentheses `()`, and a colon `:`. The code block inside the function is indented.

```python
def greet():
    print("Hello, world!")
```
- `greet`: The name of the function.
- `print("Hello, world!")`: The code that will be executed when the function is called.

 2. **Calling a Function**
Once a function is defined, you can call it by writing its name followed by parentheses.

```python
greet()
```
- This will output: `Hello, world!`

3. **Function Parameters**
Functions can accept parameters (also called arguments), which allow you to pass data into the function.

```python
def greet(name):
    print(f"Hello, {name}!")
```
- `name`: A parameter that the function expects when it is called.

```python
greet("Alice")
```
- This will output: `Hello, Alice!`

4. **Returning Values**
Functions can return values using the `return` statement.

```python
def add(a, b):
    return a + b
```
- `return a + b`: This will return the sum of `a` and `b`.

```python
result = add(3, 5)
print(result)
```
- This will output: `8`

 5. **Default Parameters**
You can set default values for parameters. If the argument is not provided, the default value is used.

```python
def greet(name="world"):
    print(f"Hello, {name}!")
```

```python
greet()
```
- This will output: `Hello, world!`
  
```python
greet("Alice")
```
- This will output: `Hello, Alice!`

6. **Keyword Arguments**
You can call functions using keyword arguments, which allows you to specify the parameters by name.

```python
def add(a, b):
    return a + b

result = add(b=2, a=3)
print(result)
```
- This will output: `5`

7. **Variable-Length Arguments**
Functions can accept a variable number of arguments using `*args` (for positional arguments) and `**kwargs` (for keyword arguments).

```python
def add(*args):
    return sum(args)

print(add(1, 2, 3, 4))
```
- This will output: `10`

```python
def display_info(**kwargs):
    for key, value in kwargs.items():
        print(f"{key}: {value}")

display_info(name="Alice", age=30)
```
- This will output:
  ```
  name: Alice
  age: 30
  ```


### Loops and List Comprehensions

Loops and list comprehensions allow you to execute code repeatedly or generate lists efficiently.

1. **For Loops**
A `for` loop is used to iterate over a sequence (like a list, tuple, string, or range) and execute a block of code for each item.

Basic Syntax:
```python
for item in sequence:
    # Code block to execute
```

Example:
```python
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
    print(fruit)
```
- This will output:
  ```
  apple
  banana
  cherry
  ```

2. **While Loops**
A `while` loop repeats as long as a specified condition is true.

 Basic Syntax:
```python
while condition:
    # Code block to execute
```

 Example:
```python
count = 0
while count < 5:
    print(count)
    count += 1
```
- This will output:
  ```
  0
  1
  2
  3
  4
  ```

3. **Break and Continue**
- `break`: Exits the loop prematurely.
- `continue`: Skips the rest of the code inside the loop for the current iteration and moves to the next iteration.

Example:
```python
for i in range(10):
    if i == 5:
        break
    print(i)
```
- This will output:
  ```
  0
  1
  2
  3
  4
  ```

```python
for i in range(10):
    if i % 2 == 0:
        continue
    print(i)
```
- This will output:
  ```
  1
  3
  5
  7
  9
  ```

4. **List Comprehensions**
List comprehensions provide a concise way to create lists. They are generally faster and more readable than using loops to build lists.

Basic Syntax:
```python
[expression for item in iterable if condition]
```

 Example:
```python
squares = [x**2 for x in range(10)]
print(squares)
```
- This will output:
  ```
  [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
  ```

With a Condition:
```python
even_squares = [x**2 for x in range(10) if x % 2 == 0]
print(even_squares)
```
- This will output:
  ```
  [0, 4, 16, 36, 64]
  ```



In [None]:
### Python Unpacking

Unpacking in Python allows you to assign elements of a sequence (like a list, tuple, or string) to variables in a single statement. It's a powerful feature that makes code cleaner and more readable. Here's a brief guide to understanding and using unpacking.

1. **Basic Unpacking**
Unpacking lets you assign each element of a sequence to a variable.

Example with Tuples:
```python
tup = (1, 2, 3)
a, b, c = tup

print(a)  # Outputs: 1
print(b)  # Outputs: 2
print(c)  # Outputs: 3
```

Example with Lists:
```python
lst = [4, 5, 6]
x, y, z = lst

print(x)  # Outputs: 4
print(y)  # Outputs: 5
print(z)  # Outputs: 6
```

2. **Unpacking with Strings**
You can also unpack strings into individual characters.

```python
s = "hi"
a, b = s

print(a)  # Outputs: 'h'
print(b)  # Outputs: 'i'
```

3. **Using the Asterisk (`*`) for Extended Unpacking**
The asterisk `*` can be used to capture multiple elements during unpacking. This is useful when you want to unpack part of a sequence while capturing the rest.

#### Example:
```python
numbers = [1, 2, 3, 4, 5]
a, b, *rest = numbers

print(a)      # Outputs: 1
print(b)      # Outputs: 2
print(rest)   # Outputs: [3, 4, 5]
```

You can also place the `*` in different positions:

```python
a, *middle, b = numbers

print(a)      # Outputs: 1
print(middle) # Outputs: [2, 3, 4]
print(b)      # Outputs: 5
```


### Pandas

We have already saw `pandas` in action, so let's have a look further!

Pandas is a key library to manipulate and preprocess data.

1. **Importing Pandas**
To use pandas, you need to import it into your Jupyter notebook.

```python
import pandas as pd
```

2. **Core Data Structures**
Pandas primarily provides two data structures:

- **Series**: A one-dimensional labeled array, similar to a column in a spreadsheet.
- **DataFrame**: A two-dimensional labeled data structure with columns of potentially different types, similar to a table in a relational database or an Excel spreadsheet.

In majority of cases we will work with dataframes, but if you need to create a series you can do it:

```python
import pandas as pd

data = [1, 3, 5, 7, 9]
series = pd.Series(data)
print(series)
```
- Outputs:
  ```
  0    1
  1    3
  2    5
  3    7
  4    9
  dtype: int64
  ```

Creating a DataFrame:
```python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print(df)
```
- Outputs:
  ```
       Name  Age
  0    Alice   25
  1      Bob   30
  2  Charlie   35
  ```

3. **Loading Data** (we saw this already)
Pandas makes it easy to load data from various file formats, such as CSV, Excel, JSON, and SQL.

 Loading a CSV File:
```python
df = pd.read_csv('data.csv')
```
- `pd.read_csv('data.csv')`: Loads a CSV file into a DataFrame.

4. **Exploring Data**
Once your data is loaded into a DataFrame, you can explore and understand it using several methods.

 Viewing the DataFrame:
```python
print(df.head())  # Displays the first 5 rows
print(df.tail())  # Displays the last 5 rows
print(df.info())  # Provides a concise summary of the DataFrame
print(df.describe())  # Generates summary statistics
```

5. **Selecting Data**
You can select specific columns or rows from a DataFrame.

 Selecting Columns:
```python
ages = df['Age']  # Select a single column
names_and_ages = df[['Name', 'Age']]  # Select multiple columns
```

 Selecting Rows:
```python
first_row = df.iloc[0]  # Select the first row by index
subset = df.iloc[0:2]  # Select the first two rows
```

 Selecting by Condition:
```python
adults = df[df['Age'] >= 30]  # Select rows where the Age is 30 or older
```

6. **Modifying Data**
Pandas allows you to modify your DataFrame by adding or removing columns and rows.

 Adding a Column:
```python
df['Height'] = [160, 175, 180]  # Add a new column
```

 Removing a Column:
```python
df = df.drop('Height', axis=1)  # Remove a column
```

 Modifying Values:
```python
df.loc[0, 'Age'] = 26  # Modify a specific value
```

7. **Handling Missing Data**
Missing data is common in real-world datasets. Pandas provides tools to handle these efficiently.

 Checking for Missing Data:
```python
print(df.isnull())  # Returns a DataFrame with True where data is missing
print(df.isnull().sum())  # Count missing values in each column
```

 Filling Missing Data:
```python
df['Age'] = df['Age'].fillna(df['Age'].mean())  # Fill missing values with the mean
```

 Dropping Missing Data:
```python
df = df.dropna()  # Drop rows with any missing values
```

8. **Grouping and Aggregation**
You can group your data and perform aggregate operations like sum, mean, count, etc.

 Grouping and Aggregating:
```python
grouped = df.groupby('Age').size()  # Group by 'Age' and count the occurrences
print(grouped)
```

9. **Merging and Joining**
You can combine multiple DataFrames using merge and join operations.

 Merging DataFrames:
```python
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'value2': [4, 5, 6]})
merged_df = pd.merge(df1, df2, on='key', how='inner')  # Inner join on 'key'
print(merged_df)
```

10. **Saving Data**
After processing your data, you can save it to various formats.

 Saving to a CSV File:
```python
df.to_csv('output.csv', index=False)  # Save DataFrame to a CSV file without the index
```