# What is Pandas?
<a id="pandas"> </a>

The `pandas` package is one of the most popular Python tools for data management and manipulation. `pandas` is built *on top* of `numpy`. Thus, much of the functionality and methods that are available in `numpy` are also available in `pandas`. 

## Getting started
The`pandas` package is included with Anconda, but can be installed using either `conda` or `pip`.
```Python
# Use default channel
conda install pandas

# Specify the conda-forge channel
conda install -c conda-forge pandas

# Use pip
pip install pandas

```

# Series and DataFrames
A pandas Series object is a one-dimensional labeled array that can hold any data type. It is one of two fundamental data structures provided by the pandas library. The other data structure is the DataFrame, which we'll examine next. Isolating a single column from a DataFrame results in a Series object.

A Series consists of two main components: the index and the data. The index provides labels for each element in the Series, allowing for easy and efficient data access and alignment. The data component contains the actual values.

You can create a Series using various data sources, such as lists, arrays, dictionaries, DataFrames, or even other Series objects. Here's an example of creating a Series from a list:

In [21]:
import pandas as pd

In [None]:
grades = [88, 67, 100, 92, None, 95, 82, 100, 100, 95]
grade_series = pd.Series(grades)
grade_series

Note that a default index is added to the grades to create the series object.

Alternatively, you can specify an index. In this case, the stduent ID is provided as the index.

In [None]:
grades = [88, 67, 100, 92, None, 95, 82, 100, 100, 95]
students = ['dmac', 'edev', 'joeb', 'tdog', 'txroy', 'sthicks', 'jfrerk', 'spickard', 'choenes', 'jsisson']
student_grades_series = pd.Series(grades, students)
student_grades_series

Understanding what type of object you're working with is important in any programming language. Different objects (classes) have different methods and attributes. A pandas Series object has different methods and attributes than a pandas DataFrame. Below is a partial listing of the methods available with Series objects.

## Series functions

### head()/tail()
View the first few or last few items in a Series using head/tail.

In [None]:
grade_series.head()

In [None]:
grade_series.tail()

### Math functions
* describe() - Display descriptive statistics of your data using the ```describe()``` function.
* sum()
* min()
* max()
* mean()
* median()
* std()

In [None]:
grade_series.describe()

### Data Manipulation Functions
* isnull() - checks for missing values (null/NaN)
* unique() - returns an list of unique values
* value_counts() - returns the fequencies of unique values
* apply(function) - applies a function to each element
* dropna() - returns a new series with missing values removed

### isnull()
Use ```isnull()``` to check for null values. A True/False series is returned, which corresponds to each item in the series. True indicates the value is null (NaN). NaN means "Not a Number."

In [None]:
grade_series.isnull()

### unique() - Find unique values

In [None]:
print(grade_series.unique())

### apply() function
Use the apply function to modify every item in a series using a standard or custom function. In this example, we use a custom function to create a series containing the letter grade.

In [None]:
def number_to_letter_grade(score):

    if score > 89:
        return "A"
    elif score > 79:
        return "B"
    elif score > 69:
        return "C"
    elif score > 59:
        return "D"
    else:
        return None
    

apply() - Modifying values using a function

In [None]:
letter_series = grade_series.apply(number_to_letter_grade)

In [None]:
letter_series

In [None]:
grades_no_missing = grade_series.dropna()
grades_no_missing

Notice that the None/null/NaN item has been removed.

# Indexes

# Locating and Filtering data

In [None]:
import pandas as pd
df_people = pd.read_csv('files/people_data.csv')
df_people.head()

The iloc function in pandas is used for indexing and selecting data from a DataFrame based on integer positions. It allows you to specify row and column indices to access specific data points or subsets of the DataFrame.

The general syntax of iloc is:

```Python
df.iloc[row_index(s), column_index(s)]


```

In [None]:
# Return the first row. Since it is one-dimensional, it is returned as a Series.
df_people.iloc[1]

In [None]:
# Return the first 8 rows and first 2 columns (the 0 can be ommitted)
df_people.iloc[0:8,0:2]

In [None]:
# Return rows 1, 3, and 5 and show only the first name, last name, and salary
df_people.iloc[[1,3,5],[0,1,9]]

## Using loc()
The ```loc()``` function in pandas is used for indexing and selecting data from a DataFrame based on labels. It allows you to specify row and column labels to access specific data points or subsets of the DataFrame.

The general syntax of loc is:

```Python
df.loc[row_label(s), column_label(s)]
```

In [None]:
married_filter = df_people['Marital Status'] == "Married"

In [None]:
df_people.loc[married_filter]

# Modifying data

# Add / Remove Rows and Columns

# Grouping and Aggregating

# Handling Missing Values

# Casting Data Types

# Working with Time Series Data

# Interacting with Excel, JSON, Parquet files, SQL

## Using iloc

# Using Polars with Pandas

## 