# Working with Data Structures: Lists, Dictionaries, and Data Frames

## Objective
By the end of this tutorial, you'll have a solid understanding of these structures and their practical applications. We will explore three essential data structures in Python: 
- Lists
- Dictionaries
- Data Frames.


## Pre-requisites
- Python Environment: You should have a Python environment set up on your system. If you don't have Python installed, you can download it from the [official Python website](https://www.python.org/downloads/). We recommend downloading Python 3.8 or above.
- Jupyter Notebook: This code is intended to be run in a Jupyter Notebook environment. Make sure you have [Jupyter Notebook installed](https://jupyter.org/install).
- Library Installation: You need to install the required libraries:
    - pandas

To install, run the following commands:
"**!pip3 install pandas**"

In [1]:
# Install the required libraries
!pip3 install pandas


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m


### Introduction

Data structures in Python are essential for storing and organizing data efficiently. They are crucial for performing various operations on data. The choice of the right data structure depends on the specific requirements of your application and the operations you need to perform on the data.

#### Lists

- A list is a collection of items that can be of different types, such as numbers, strings, or other objects.
- Lists are defined using square brackets **[ ]**.
- Lists can be modified, which means you can add, remove, or modify elements.

In [3]:
my_list = [1, 2, 3, 4, 5]
print(my_list)

[1, 2, 3, 4, 5]


##### List Operations

- **Indexing**: Accessing elements by their position.

In [15]:
my_list = [10, 20, 30, 40]
print(my_list[0])  

10


- **Slicing**: Extracting a portion of the list.

In [14]:
my_list = [10, 20, 30, 40]
print(my_list[1:3])  

[20, 30]


- **Concatenation (+)**: Combining two lists into a new list using the + operator.

In [13]:
list1 = [1, 2]
list2 = [3, 4]
result = list1 + list2
print(result) 

[1, 2, 3, 4]


- **Repetition (*)**: Repeating a list a certain number of times.

In [12]:
my_list = [1, 2]
repeated_list = my_list * 3
print(repeated_list)  

[1, 2, 1, 2, 1, 2]


- **Append and Extend**:
The append method adds an element to the end of the list. The extend method appends elements from another iterable to the end of the current list.

In [11]:
my_list = [1, 2]
my_list.append(3)
print(my_list)  # Outputs: [1, 2, 3]

list1 = [1, 2]
list2 = [3, 4]
list1.extend(list2)
print(list1) 

[1, 2, 3]
[1, 2, 3, 4]


- **Insert**: Adding an element at a specific position within the list.

In [10]:
my_list = [1, 2, 3]
my_list.insert(1, 4)  # Insert 4 at index 1
print(my_list) 

[1, 4, 2, 3]


- **Remove and Pop**:
The remove method deletes the first occurrence of a specified value from the list.
The pop method removes and returns an element based on the provided index.

In [17]:
# Remove method
my_list = [1, 2, 3, 2]
my_list.remove(2)  # Removes the first '2'
print(my_list)

[1, 3, 2]


In [19]:
# Pop method
my_list = [1, 2, 3]
popped_element = my_list.pop(1)  # Removes and returns element at index 1
print(popped_element)

2


- **Len**: Getting the length of the list using the len function.

In [20]:
my_list = [1, 2, 3, 4]
length = len(my_list)
print(length) 

4


### Dictionaries

A dictionary is a collection of key-value pairs. Keys are unique identifiers used to access or reference the associated values. Dictionaries are unordered, and elements are accessed by their keys.

In [21]:
person_info = {
    "name": "Alice",
    "age": 30,
    "city": "New York"
}

##### Dictionary Operations


- **Accessing values by keys**:
You can access values associated with keys using square brackets or the get() method. Using square brackets results in a KeyError if the key is not present, while get() allows you to provide a default value.

In [22]:
my_dict = {"name": "Alice", "age": 30}
name = my_dict["name"]  # Accessing using square brackets
age = my_dict.get("age")  # Accessing using get()
print(name)
print(age)   

Alice
30


- **Modifying values**: You can change the value associated with a specific key by assigning a new value to it.

In [23]:
my_dict = {"name": "Alice", "age": 30}
my_dict["age"] = 31  # Modifying the value associated with the "age" key
print(my_dict)  

{'name': 'Alice', 'age': 31}


- **Adding new key-value pairs**: To add a new key-value pair to a dictionary, simply assign a value to a new key.

In [24]:
my_dict = {"name": "Alice", "age": 30}
my_dict["city"] = "New York"  # Adding a new key-value pair
print(my_dict)

{'name': 'Alice', 'age': 30, 'city': 'New York'}


- **Removing key-value pairs**:
You can remove key-value pairs from a dictionary using the del statement or the pop() method.
    - del deletes a key-value pair.
    - pop() removes a pair and returns the associated value.

In [27]:
# del method
my_dict = {"name": "Alice", "age": 30}
del my_dict["age"]  # Removing the "age" key-value pair
print(my_dict) 

{'name': 'Alice'}


In [26]:
# pop() method
my_dict = {"name": "Alice", "age": 30}
age = my_dict.pop("age")  # Removing and returning the "age" value
print(age) 

30


- **Checking if a key exists**: You can check if a key exists in a dictionary using the in operator.

In [29]:
my_dict = {"name": "Alice", "age": 30}
has_name = "name" in my_dict  # Checking if "

### Data Frames

-  A data frame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure.
-  It's a primary data structure used in the pandas library for data analysis and manipulation.
-  Well, think of a data frame like a special kind of table you might use when sorting out your recipes.
-  It can hold lots of information in rows and columns, kind of like how you write down your favorite recipes in a big book.
-  Now, this table can change in size as you add more recipes or take some away, and it's pretty smart because it can hold all sorts of different things, not just recipes.
-  People who like to organize and analyze information use something called the pandas library, and they often use data frames for that.
-  So, a data frame is like a handy tool in a computer cookbook for keeping and working with all sorts of information.


In [39]:
import pandas as pd

# Generating synthetic data
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'San Francisco', 'Los Angeles']
}
# Create a data frame
df = pd.DataFrame(data)

# View dataframe
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles


##### Data frame Operations

- While data frames don't have "operators" in the same way that lists or dictionaries do, they have various methods and operations that allow you to work with tabular data effectively.
- Let's explore some essential data frame operations with explanations and examples:
    - Creating a data frame: You can create a data frame from various data sources, including dictionaries, lists, CSV files, or database queries.
    - Viewing Data: You can inspect the contents of a data frame using the head() or tail() methods to see the first or last few rows.
    - Selecting and filtering data: You can select specific columns of a data frame by referencing them using square brackets or the dot notation.
    - Grouping and aggregating data: You can group data based on one or more columns and then perform aggregate functions like sum, mean, or count on those groups.
    - Adding and removing columns: You can add new columns or remove existing ones.
    - Sorting Data: You can sort the data frame by one or more columns.
    - Merging data frames: You can combine two data frames based on a common column or index using operations like merge() or concat().
    - Handling missing data: Data frames provide methods for handling missing data, such as fillna() or dropna().
    - Statistical Analysis: You can perform various statistical analyses on a data frame using methods like describe() or corr().

In [42]:
import pandas as pd

# Creating a dataframe
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'San Francisco', 'Los Angeles']
}

df = pd.DataFrame(data)  # Creating a DataFrame from a dictionary

In [41]:
# Viewing data:
df.head()  # View the first few rows
df.tail()  # View the last few rows

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles


In [44]:
# Selecting data
df['Name']  # Selecting a column using square brackets
df.Age  # Selecting a column using dot notation

0    25
1    30
2    35
Name: Age, dtype: int64

In [47]:
# Filtering data

df[df.Age > 30] # Filtering rows where Age is greater than 30

Unnamed: 0,Name,Age,City
2,Charlie,35,Los Angeles


In [49]:
# Grouping data
df.groupby('City')['Age'].mean()  # Average age per city

City
Los Angeles      35.0
New York         25.0
San Francisco    30.0
Name: Age, dtype: float64

In [51]:
# Add or remove columns

df['Income'] = [50000, 60000, 75000]  # Adding a new column
df.drop('Income', axis=1, inplace=True)  # Removing a column

In [52]:
# Sorting data

df.sort_values(by='Age', ascending=False)

Unnamed: 0,Name,Age,City
2,Charlie,35,Los Angeles
1,Bob,30,San Francisco
0,Alice,25,New York


In [61]:
import pandas as pd

data1 = {
    'Vegetables': ['Cauliflower', 'Tomato', 'Potato'],
    'Cost': [35, 40, 50],
    'AmountVeggie': ['4', '5', '6']
}

data2 = {
    'Fruits': ['Apple', 'Banana', 'Cherry'],
    'Cost': [25, 30, 35],
    'AmountFr': ['8', '9', '3']
}

# Convert dictionaries to data frames
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Merge data frames based on a common column
merged_df = pd.merge(df1, df2, left_on='Cost', right_on='Cost')
merged_df

Unnamed: 0,Vegetables,Cost,AmountVeggie,Fruits,AmountFr
0,Cauliflower,35,4,Cherry,3


In [62]:
# Handling missing values
df.fillna(0)  # Replace missing values with 0
df.dropna()   # Remove rows with missing values

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles


### Summary

- These are some of the common data frame operations in pandas, which make it a powerful tool for data manipulation and analysis in Python.
- Data frames provide a structured way to work with tabular data, making it easier to perform tasks such as data cleaning, exploration, and visualization.

- To summarize, here's a quick comparison of Lists, Dictionaries, and Data frames:
    - Lists: Ordered, mutable, elements accessed by index.
    - Dictionaries: Unordered, key-value pairs, elements accessed by keys.
    - Data frame: Tabular, two-dimensional, ideal for data analysis.

In conclusion, understanding these data structures is essential for any Python programmer. Lists are versatile for storing ordered data, dictionaries are efficient for key-value pairs, and data frames are powerful tools for data manipulation. Thank you!