# Introduction to Pandas Series and DataFrames

## Objectives

* Understand Pandas Series and DataFrames
* Creating Series and DataFrames
* Basic Operations with Series 
* Exploring DataFrame Basics
* Selecting Data from DataFrames
* Applying Functions to Series and DataFrames

## Loading Libraries

In [3]:
# numpys - for arithmetic operations and high-level mathematical functions to operate on arrays
import numpy as np
# pandas - for working with relational or labeled data
import pandas as pd 

## What is a Pandas Series?

* **One-Dimensional** labeled Array capable of holding data on any type such as *intergers*, *string*, *float*, *python objects* etc.
* A pandas series is like a column in a table.


### Key features of a Pandas Series

* **Homogeneous Data**: A Series Holds Data of a single data type(integer, float, string etc), ensuring homogeneity within the Series.
* **Labeled Index**: Each element in a Series is associated with a label called an *index*. Having unique labels is a common practice, though not strictly required. The labels just need to be hashable types, ie they need to be used as keys in a dictionary. This index allows for easy and efficient data retrieval and manipulation.
* **Vectorized Operations**: - Series support vectorized operations, ie you can apply operations to the entire series without the need for explicit loops.
* **Alignment of Data**: - When performing operations on a Series, Pandas automatically aligns data based on index labels, which simplifies data manipulation.
* **Creation**: - Can be created from a List, NumpyArrays, Dictionary, DataFrame slice and other data sources. 

In [4]:
# example of a series from a list 
marks = [10, 20, 33, 42, 19, 30]

# series
marks_series = pd.Series(marks)
marks_series

0    10
1    20
2    33
3    42
4    19
5    30
dtype: int64

## Creating and Displaying

In [5]:
# example 1 - Creating a series from a list
data = [10.5, 11.2, 10.7, 9.9, 10.2]

# series
list_series = pd.Series(data, name="Student Marks")
list_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Student Marks, dtype: float64

In [6]:
# data type 
type(list_series)

pandas.core.series.Series

In [8]:
# example 2 - Creating a series from a NumPy Array
data_arr = np.array(data) # created an array from a list

type(data_arr)

numpy.ndarray

In [9]:
# series from array
arr_series = pd.Series(data_arr, name="Array Series")
arr_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Array Series, dtype: float64

In [10]:
# example 3 - Series dictionary 
data_dict = {
    "Prof" : 100,
    "Dominic" : 250,
    "Carol" : 300, 
    "Eve" : 450
}

type(data_dict)

dict

In [11]:
# series from dict
dict_series = pd.Series(data_dict, name="Sky Team")
dict_series

Prof       100
Dominic    250
Carol      300
Eve        450
Name: Sky Team, dtype: int64

In [17]:
# series with custom index labels
balance = [1000, 1500, 2000, 4000] # data to store in the series
custom_labels = ['A', 'B', 'C', 'D'] # custom indexes

custom_label_series = pd.Series(data = balance, index=custom_labels, name='Balances')
custom_label_series

A    1000
B    1500
C    2000
D    4000
Name: Balances, dtype: int64

## Basic Operations With Series

In [18]:
arr_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Array Series, dtype: float64

In [19]:
# accessing elements in a series 
print(arr_series[3])

9.9


In [20]:
dict_series

Prof       100
Dominic    250
Carol      300
Eve        450
Name: Sky Team, dtype: int64

In [21]:
# accessing elements in a series 
print(dict_series['Carol'])

300


In [22]:
custom_label_series

A    1000
B    1500
C    2000
D    4000
Name: Balances, dtype: int64

In [24]:
# accessing elemets in a series
print(custom_label_series['B':'D'])

B    1500
C    2000
D    4000
Name: Balances, dtype: int64


In [25]:
# arithmetic operations
# convert balances into percentages 
x = custom_label_series / 100
x

A    10.0
B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [26]:
# filter elements 
x_filtered = x[x >= 15]
x_filtered

B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [28]:
# basic summary statistics
x

A    10.0
B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [29]:
# mean 
mean = x.mean()
print(mean)

21.25


In [30]:
# std 
std = x.std()
print(std)

13.149778198382917


In [31]:
# max
max = x.max()
print(max)

40.0


## Applying Functions to a Series 

### Lambda Functions

* Small anonymous function that is not bound to an identifier.
* Similar to user defined functions but without a name.
* It's simple and straightfoward, requiring only the argument(s) and expression, alongside the keyword `lambda`.
* They require only one line of code.

```
def func_name(parameters):
    code block
    
    return return_value
```

`func = lamda parameters: return_value`

* `lambda` : Keyword that indicates definition of a lambda function.
* `parameters`: The input parameters that the lambda function will take.
* `return_value`: A single expression that defines the compuation the lambda function performs and its return value 

In [94]:
# lets compare the two


In [93]:
# lamda function


### Generate Random Numbers

* Using `NumPy` library to generate random Numbers.


In [92]:
# generate random numbers 


In [91]:
# display random numbers 


In [90]:
# create a series 


In [89]:
# display the first five rows of the series


In [88]:
# display last five rows


### Using the `apply()` Function in a Series

* It's a powerful way to transform and analyze the data within the series.
* Above we have generate a series of random numbers, and created a function called `square` that takes in an int, squares it and return the value. Lets apply that function to the series.

In [None]:
# square the series random numbers 

In [None]:
# use .rename to rename the series

### `lambda` function with `apply()`

In [87]:
# Cube the numbers using lambda and apply


In [70]:
# rename the series

### Using the `map()` Function in a series

* Used to substitute each value in a Series with another value creating a convenient way to transform the values in a Series.

In [86]:
# map our random numbers as pass or fail


### `lambda` function with `map()`

In [78]:
# use lamda function with map() to double each number


In [79]:
# rename the series

### `lamda` function with Conditional Statement

In [85]:
# are the random numbers even or odd


In [None]:
# rename the series

## Series to DataFrame 

* `if` a **Series** is a *table* with a single column, `elif` a **DataFrame** is a *table* with two or more columns.

In [None]:
# lets convert all the series we created into a dataframe


## Knock Yourself Out!

You work as a real estate agent at *MoringaHome Realty*. To assist your clients in making informed decisions about property investment, you decide to analyze property data using Pandas. 
1. Generate 120 random numbers between  Ksh 4000 and Ksh 20,000 using numpy to represent the prices of the houses. 
2. Display the first and last 7 houses.
3. Create a function that will take in the price of the house and return the category of that house, eg Suburb. The category is of your own series.
4. Apply the function created above to the series.
6. Apply a lambda function to increase the property prices by 10% due to the new tax laws.
7. Apply a custom function to increase the property prices by and additional Ksh 250 for garbage. 
8. Create a new Series for each step and Finally Combine them all into a DataFrame name 'Moringa_property'.

In [2]:
import numpy as np
import pandas as pd

# Step 1: Generate 120 random house prices between Ksh 4000 and Ksh 20,000
house_prices = np.random.randint(4000, 20001, 120)

# Step 2: Display the first and last 7 houses
print("First 7 houses:\n", house_prices[:7])
print("Last 7 houses:\n", house_prices[-7:])

# Step 3: Create a function to categorize houses
def categorize_house(price):
    if price < 8000:
        return 'Low-end'
    elif price < 15000:
        return 'Mid-range'
    else:
        return 'High-end'

# Step 4: Apply the categorize_house function to the series
house_categories = pd.Series(house_prices).apply(categorize_house)

# Step 5: Apply a lambda function to increase prices by 10%
house_prices_taxed = pd.Series(house_prices).apply(lambda price: price * 1.10)

# Step 6: Apply a custom function to increase prices by Ksh 250
def increase_price(price):
    return price + 250

house_prices_garbage = pd.Series(house_prices_taxed).apply(increase_price)

# Step 7: Create new Series for each step
data = {
    'House Prices': house_prices,
    'House Categories': house_categories,
    'Taxed Prices': house_prices_taxed,
    'Prices with Garbage': house_prices_garbage
}

# Step 8: Combine all Series into a DataFrame
Moringa_property = pd.DataFrame(data)

# Display the first few rows of the 'Moringa_property' DataFrame
print("\nMoringa_property DataFrame:")
print(Moringa_property.head(20))


First 7 houses:
 [12529  4878 13268  8887 16185 18552  8859]
Last 7 houses:
 [13721 14230 19707  9592 15494 14429 11392]

Moringa_property DataFrame:
    House Prices House Categories  Taxed Prices  Prices with Garbage
0          12529        Mid-range       13781.9              14031.9
1           4878          Low-end        5365.8               5615.8
2          13268        Mid-range       14594.8              14844.8
3           8887        Mid-range        9775.7              10025.7
4          16185         High-end       17803.5              18053.5
5          18552         High-end       20407.2              20657.2
6           8859        Mid-range        9744.9               9994.9
7          10331        Mid-range       11364.1              11614.1
8          12571        Mid-range       13828.1              14078.1
9          12684        Mid-range       13952.4              14202.4
10         11208        Mid-range       12328.8              12578.8
11          9276      