<a href="https://colab.research.google.com/github/hewp84/CRT420/blob/main/Pandas_Intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PANDAS
## Introduction
Pandas is a popular Python library that provides powerful data structures and functions designed specifically for data analysis workflows. The name "Pandas" actually comes from the term "panel data", representing multidimensional data.


## Data Structures

The two most important data structures in Pandas are:

* Series: A Series is a one-dimensional array-like object that can hold many different data types like integers, strings, booleans, etc. What makes a Series unique is that it has an index which assigns a label to each value. The index makes it easier to access, query, and analyze data in the Series. For example, you can use label-based indexing to slice and dice a Series, similar to how you interact with columns in a spreadsheet.

* DataFrame: A DataFrame is a two-dimensional tabular data structure with labeled rows and columns, akin to a spreadsheet, SQL table, or R data frame. It builds on the Series concept by essentially storing a number of Series objects aligned along an index. This enables complex multivariant, relational data analysis. DataFrames have powerful capabilities like easily handling missing data, merging datasets, pivoting data, and more.

Pandas combines these versatile data structures with numerous functions and methods designed to make many aspects of the typical data analysis workflow fast, efficient, and productive in Python. It excels at tasks like loading, cleaning, transforming, merging, reshaping, and visualizing data.

### Series

* A Pandas Series is a one-dimensional array-like data structure that can hold values of different data types such as integers, floats, strings, booleans, etc.
* It has an index which labels each value, like columns in a spreadsheet. This makes Series very useful for data analysis in Python.

The basic method to create a `Series` is to call:

`s = pd.Series(data, index=index)`
#### Creating a Series

In [6]:
import pandas as pd

# Create from list 
mylist = [1, 2, 3, 4]
myseries1 = pd.Series(mylist)

# Create from numpy array
import numpy as np
arr = np.array([1, 2, 3, 4])
myseries2 = pd.Series(arr) 

# Create from dictionary 
mydict = {'a': 1, 'b': 2, 'c': 3} 
myseries3 = pd.Series(mydict)

# Create from scalar value 
myseries4 = pd.Series(5, index=[0, 1, 2, 3])

In [10]:
print(myseries1)
print(myseries2)
print(myseries3)
print(myseries4)


0    1
1    2
2    3
3    4
dtype: int64
0    1
1    2
2    3
3    4
dtype: int32
a    1
b    2
c    3
dtype: int64
0    5
1    5
2    5
3    5
dtype: int64


#### Series Attributes

In [13]:
print(myseries4.values) # The actual data values 
print(myseries4.index) # The index for each value
print(myseries4.dtype) # The data type (int, float, object) 
print(myseries4.name) # Name of the Series 
print(myseries4.shape) # Number of elements

[5 5 5 5]
Int64Index([0, 1, 2, 3], dtype='int64')
int64
None
(4,)


#### Accessing elements

In [16]:
#print(myseries1['a']) 
print(myseries1[1])

2


#### Operations with Series

In [19]:
doubled = myseries1 * 2 # Arithmetic
filtered = myseries1[myseries1 > 2] # Filtering 
sorted1 = myseries1.sort_values() # Sorting
total = myseries1.sum() # Aggregation

print(doubled)

0    2
1    4
2    6
3    8
dtype: int64
