<img src="Data/UP Data Science Society Logo 2.png" width=700>

# pandas Series and DataFrame

### Structure
```
├── Introduction to Series and DataFrame
│   ├── Dimension of pandas data structure
├── Series
│   ├── Creating a simple Series object
│   ├── Understanding indexes better
├── DataFrame
│   ├── Creating a simple DataFrame object
│   ├── The Differences of a Series over a DataFrame
│   ├── Column-oriented vs Row-oriented Thinking
│   ├── Understanding indexes better
├── Deep Dive into Series and DataFrames
│   ├── Attributes
│   ├── Operators
│   ├── Methods
```

___

## Introduction to Series and DataFrame
There are two main data structures that are specifically used to deal with array and tabular data within the pandas package -- Series and DataFrame

To make this easily understood, think of it this way:
* a Series is similar to a single column of data while,
* a DataFrame is like your usual rows and columns whenever you open a spreadsheet
* you can also think that a DataFrame is a collection of several Series objects to form your rows and columns. 

We will start on discussing Series and DataFrame as it will prove to be helpful in the succeeding lessons on to know how and when the best use-case scenario for each kind for any of those two data structures. We will definitely spend a long time with this to ensure that you will understand how pandas work and how to effectively use the package.

### Dimensions of the pandas data structures
To better distinguish the difference of the two, Table 3.1 illustrates the dimensions of both Series and DataFrame

<center>Table 3.1: Dimensions of pandas data structures</center>
<center>

| Data Structure | Dimensionality | Spreadsheet | Database | Linear Algebra |
|----------------|----------------|-------------|----------|----------------|
| Series         | 1D             | Column      | Column   | Column Vector  |
| DataFrame      | 2D             | Single Sheet| Table    | Matrix         |

</center>

</br>

___

## Series

A Series is used to model one-dimensional data. 

The Series object also has a few more bits of data, including an index and a name. 

A common idea through pandas is the notion of an axis. Because a series is one-dimensional, it has a single axis—the index.

### Creating a simple Series object using pandas

Creating a Series in pandas is super simple. In the example below, we will create a simple table showing a tally of four Math17 test scores using Python and pandas

In [38]:
import pandas as pd

Math17Scores = pd.Series([60.5,61,60,63], name = 'Math 17 test scores')

print(Math17Scores)

0    60.5
1    61.0
2    60.0
3    63.0
Name: Math 17 test scores, dtype: float64


You may be wondering... 

*_```I thought you said that Series is a single column of data? Why do we have 2 columns in the example?```_*

The leftmost column is what we call as the *_index_*. The index is not part of the values. 

The generic name for an index is an axis, and the values of the index — 0, 1, 2, 3 — are called axis labels. 

The data — 60.5, 61.0, 60.0, and 63.0 — is also called the values of the series


### Understanding indexes better
This double abstraction of the index seems unnecessary at first glance—a list already has integer indexes. But there is a trick up pandas’ sleeves. By allowing non-integer values, the data structure supports other index types such as strings, dates, as well as arbitrarily ordered indices, or even duplicate index values.

The index is a core feature of pandas’ data structures given the library’s past in analysis of financial data or time-series data. Many of the operations performed on a Series operate directly on the index or by index lookup.

We then proceed with DataFrames so it would be more intuitive to understand

___

## DataFrames

In pandas, the two-dimensional counterpart to the one-dimensional Series is the DataFrame. 

It is better understood as a collection of Series. Each column in a DataFrame can be a Series on its own.

### Creating a simple DataFrame using pandas

Manually creating a pandas DataFrame object is a bit more complex compared to how we did it with a Series object as we are now dealing with multiple Series representing a separate column within a DataFrame.

Continuing with the Math 17 Scores table, we now add Names and Degree Program which then adds more details to our table. We will get Jom's and Nico's score when they took their Math 17 exam during their freshie year.

In [39]:
import pandas as pd

Math17Scores_2 = pd.DataFrame (
    {
        'Name':['Jom','Nico'],
        'Program':['BSIE','BSMath'],
        'Score':[60,100]
    }
)

print(Math17Scores_2)

   Name Program  Score
0   Jom    BSIE     60
1  Nico  BSMath    100


### Breaking down the differences between Series and DataFrame

(Show difference of Series with DataFrame --- particularly the difference in number of axis)

### Column-oriented vs Row-oriented Thinking

If you're familiar or have some experience with using spreadsheets, you might have been trained to think in a row-oriented basis whereas you are usually referring to the set of values in a row to be contained together. From our example in Math17Scores_2, you may intuitively think that getting the details of Jom would require you to look at row index 0 to pull Jom|BSIE|60. 

However in practice, you might deal with datasets where the analysis might need to be focused over a certain column and eventually would require you to simply pull a columm and perform your operations there. From our example, you might need to make an analysis over the performance of degree programs in Math 17 or get the standard deviation of the scores of the students. Training yourself to be more familiar with index operations, specially in DataFrames will benefit greatly throughout your data science journey. We will discuss this further on our deep dive into Series and DataFrames below.

___


### CHECKPOINT 1: Creating your own Series and DataFrame objects

Referring to the examples above, 
* Create a Series object containing all the colors of the rainbow 
* Create a DataFrame containing a list of 10 of your friends, their gender, their age, and their favorite food. 

The code block for each has been started for your convenience. 

To run your code, you may click on the triangle on the left side of the code block or press ```Shift + Enter``` after you've written your code

In [40]:
# Modify this for your Series object

import pandas as pd


In [41]:
# Modify this for your DataFrame object

import pandas as pd

___

## Deep Dive into Series and DataFrames

There are many ways to inspect, manipulate, and do operations with a Series and a DataFrame. In this part, we will start introducing them.

To make the learning a bit more structured, we shall start with their respective ```attributes``` then proceed with their ```operators``` and ```methods```. 

### Series and DataFrame Attributes
Attributes are properties or characteristics associated with an object that provide information about the object's state or features. In even simpler terms, attributes give you details about what the object is like or what it contains.

Here are ten (10) common attributes that you can use when inspect a Series/DataFrame object. 

Namely:
1. `dtype` - returns the data type of the element in the Series/DataFrame
2. `index` - returns the index of the Series/DataFrame
3. `values` - returns the values of the Series/DataFrame as a NumPy array
4. `size` - returns the number of elements in the Series/DataFrame
5. `shape` - returns a tuple representing the number of elements along that dimension
6. `name` - returns the name of the Series/DataFrame
7. `empty` - returns a boolean value indicating if the Series/DataFrame is empty or not
8. `head()` - returns the first 'n' elements of the Series/DataFrame. By default, it returns the first 5 elements of the Series/DataFrame
9. `tail()` - returns the last 'n' elements of the Series/DataFrame. By default, it returns the last 5 elements of the Series/DataFrame
10. `unique()` - returns the unique values in the Series/DataFrame

To use an attribute, you call your data object then add "." then followed by the attribute syntax. 

(e.g. s.head(), s.empty) 

An example on how to use the the attributes is shown below. Feel free to experiment in the code blocks below to familiarize yourself better with Series/DataFrame attributes

In [57]:
# Test the attributes on this simple Series object

import pandas as pd

data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
s = pd.Series(data)

s.dtypes


dtype('int64')

In [43]:
# Test the attributes on this simple DataFrame object

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Age': [25, 30, 35, 40, 45],
        'Salary': [50000, 60000, 70000, 80000, 90000]}
df = pd.DataFrame(data)

df.shape

(5, 3)

### Series and DataFrame Operators

In pandas, Series and DataFrame objects support various operators for performing common data manipulation tasks. 

#### Arithmetic Operators
* Arithmetic operators (+, -, *, /, **) perform element-wise operations between corresponding elements of two Series or DataFrames or between a Series/DataFrame and a scalar value. 
* Operations are performed element-wise by default.

<center>

| Operator       | Description    |
|----------------|----------------|
| +              | Addition       |
| -              | Subtraction    |
| *              | Multiplication |
| /              | Division       |
| **             | Exponentiation |
| %              | Modulo         |

</center>

In [46]:
# Series

s1 = pd.Series([1, 2, 3])
s2 = pd.Series([4.5, 3.5, 2.5])
result_series = s1 + s2  # Change the operator in this line and see the effect
print("\nResult (Series): ")
print(result_series)

# DataFrame

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [4, 3, 2], 'B': [1, 2, 3]})

result_dataframe = df1 + df2 # Change the operator in this line and see the effect
print("\nResult (DataFrame): ")
print(result_dataframe)




Result (Series): 
0    5.5
1    5.5
2    5.5
dtype: float64

Result (DataFrame): 
   A  B
0  5  5
1  5  7
2  5  9


For strings in a pandas Series/DataFrame, subtraction, multiplication, and division operations are not directly applicable like they are for numerical data types. 

However, you can still perform other operations such concatenation using the + operator.

In [53]:
# Strings - Series
s3 = pd.Series(['hello', 'pandas', ' '])
s4 = pd.Series([' world', ' is a powerful ', 'library'])

result = s3 + s4  # Element-wise string concatenation
print("\nResult (String concatenation):")
print(result)

# Strings - DataFrame
df1 = pd.DataFrame({'A': ['hello', 'pandas', ' '],
                    'B': ['world', 'is', 'a'],
                    'C': ['beautiful', 'a', 'powerful']})

df2 = pd.DataFrame({'D': ['library', 'for', 'data'],
                    'E': ['analysis', 'science', 'processing'],
                    'F': ['in', 'and', 'Python']})

# Concatenate corresponding strings element-wise
result_concatenation = df1['A'] + df2['D']
print("\nResult (String concatenation for column 'A' with column 'D'):")
print(result_concatenation)

result_concatenation = df1['B'] + df2['E']
print("\nResult (String concatenation for column 'B' with column 'E'):")
print(result_concatenation)

result_concatenation = df1['C'] + df2['F']
print("\nResult (String concatenation for column 'C' with column 'F'):")
print(result_concatenation)


Result (String concatenation):
0              hello world
1    pandas is a powerful 
2                  library
dtype: object

Result (String concatenation for column 'A' with column 'D'):
0    hellolibrary
1       pandasfor
2            data
dtype: object

Result (String concatenation for column 'B' with column 'E'):
0    worldanalysis
1        isscience
2      aprocessing
dtype: object

Result (String concatenation for column 'C' with column 'F'):
0       beautifulin
1              aand
2    powerfulPython
dtype: object


#### Comparison Operators
* Comparison operators (==, !=, <, >, <=, >=) perform element-wise comparison between corresponding elements of two Series or DataFrames. 
* You can use comparisons for various data types from strings, booleans, integers, etc. 
* Operations are performed element-wise, resulting in a boolean Series or DataFrame.

<center>

| Operator       | Description              |
|----------------|--------------------------|
| ==             | Equal to                 |
| !=             | Not equal to             |
| <              | Less than                |
| >              | Greater than             |
| <=             | Less than or equal to    |
| >=             | Greater than or equal to |

</center>

In [48]:
# Series
s5 = pd.Series([True, True, False])
s6 = pd.Series([False, True, False])
result = s5 <= s6  # Element-wise logical AND
print("\nComparison Result (Series):")
print(result)

# DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [4, 3, 2], 'B': [1, 2, 3]})

# Comparison
result_comparison = df1 > df2
print("\nComparison result (DataFrame):")
print(result_comparison)



Comparison Result (Series):
0    False
1     True
2     True
dtype: bool

Comparison result (DataFrame):
       A     B
0  False  True
1  False  True
2   True  True
