## From NumPy Arrays to Pandas Series and DataFrames 

#### Terminology:
* **List**: A list in Python is an ordered, mutable collection of items that can contain elements of different data types. Lists support various operations like indexing, slicing, appending, and modifying, making them a versatile tool for managing and manipulating data.
* **Array**: An array in Python, typically provided by libraries like NumPy, is a homogeneous, fixed-size data structure that stores elements of the same data type in a contiguous block of memory. Arrays are optimized for numerical computations and support advanced mathematical operations, offering more efficiency than Python lists for handling large datasets.
* **Lists vs. Arrays**
    * Data Type:
        * List: Can contain elements of different data types.
        * Array: Typically contains elements of the same data type.
    * Performance:
        * List: Slower for numerical operations due to its general-purpose nature.
        * Array: Faster and more memory-efficient for numerical operations, especially with large datasets.
    * Functionality:
        * List: Supports a wide range of operations but lacks advanced mathematical capabilities.
        * Array: Provides advanced mathematical functions and operations, especially with libraries like NumPy.

* **Series**: A one-dimensional array-like object that can hold any data type (integers, floats, strings, etc.). It is essentially a single column of data with an index, which can be labels or integers. Think of it like a list or a column in a spreadsheet.
* **DataFrame**: A two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). It can be thought of as a collection of Series objects, where each Series represents a column. DataFrames are similar to a table in a relational database or an Excel spreadsheet.


In [32]:
# Import necessary libraries
import numpy as np
import pandas as pd

### Example 1: Converting a 1D NumPy array to a Pandas Series

Example 1 uses basic numerical data to illustrate how to convert a 1-dimensional (1D) NumPy array to a Pandas Series.

In [19]:
array_1d = np.array([10, 20, 30, 40, 50])
series_1d = pd.Series(array_1d)
print(series_1d)

0    10
1    20
2    30
3    40
4    50
dtype: int64


### Example 2: Converting a 1D NumPy array to a Pandas DataFrame

Example 2 uses basic numerical data to illustrate how to convert a 1-dimensional (1D) NumPy array to a Pandas DataFrame.

In [20]:
array_1d = np.array([10, 20, 30, 40, 50])
df_1d = pd.DataFrame(array_1d, columns=['Values'])
df_1d.head()

Unnamed: 0,Values
0,10
1,20
2,30
3,40
4,50


### Example 3: Converting a 2D NumPy array to a Pandas DataFrame

Example 3 uses synthetic financial data to illustrate how to convert a 2-dimensional (2D) NumPy array to a Pandas DataFrame.


In [21]:
array_2d_financial = np.array([[100, 200], [300, 400], [500, 600]])
df_2d_financial = pd.DataFrame(array_2d_financial, columns=['Revenue', 'Profit'])
df_2d_financial.head()

Unnamed: 0,Revenue,Profit
0,100,200
1,300,400
2,500,600


### Example 4: Converting a 2D NumPy array to a Pandas DataFrame

Example 4 uses synthetic student grades to illustrate how to convert a 2D NumPy array to Pandas DataFrame

In [22]:
array_2d_grades = np.array([[85, 90], [78, 82], [92, 88]])
df_2d_grades = pd.DataFrame(array_2d_grades, columns=['Math', 'Science'])
df_2d_grades.head()

Unnamed: 0,Math,Science
0,85,90
1,78,82
2,92,88


### Example 5: Converting a 3D NumPy array to a Pandas DataFrame

Example 5 illustrates how to convert synthetic image data (flattened for demonstration) from a 3-dimensional (3D) numpy array to Pandas DataFrame.


In [24]:
array_3d_image = np.random.randint(0, 255, (2, 2, 3))  # Example 3D array
df_3d_image = pd.DataFrame(array_3d_image.reshape(-1, 3), columns=['R', 'G', 'B'])
df_3d_image.head()

Unnamed: 0,R,G,B
0,244,222,73
1,185,82,230
2,224,88,237
3,53,83,3


### Pandas Indexes

In Pandas, an index is like a label or identifier for rows in a DataFrame or Series. It helps you access and manipulate data easily. Think of it as a unique ID for each row, making it easier to locate, slice, and perform operations on specific data. Indexes can be numbers, dates, or even strings, and you can customize them to suit your needs. They improve data alignment and efficiency when merging or joining datasets.

In Pandas, a default index is automatically created when you generate a DataFrame or Series without specifying an index. This default index is a simple numerical sequence that starts at 0 and increments by 1 for each row. For example, if you create a DataFrame with 5 rows, the default index will be 0, 1, 2, 3, and 4. This default indexing allows you to easily reference and access rows by their position in the DataFrame or Series.

### Example 6: Converting a 2D NumPy array to a Pandas DataFrame with MultiIndex

Example 6 illustrates how to convert a 2-dimensional NumPy array to a Pandas DataFrame with MultiIndex.

Note that in this example, we will create two indexes (MultiIndex).  A MultiIndex in Pandas, also known as a hierarchical index, allows you to have multiple levels of indexing on your DataFrame or Series. This can be useful for handling more complex datasets with multi-dimensional data. For instance, you might have data categorized by both 'Country' and 'Year'. With a MultiIndex, you can access and manipulate subsets of data more efficiently and perform operations on different levels of the hierarchy. 

In the example below, the MultiIndex consists of a combination of **Store** (Store 1, Store 2, Store 3) and **Quarter** (Q1, Q2).  What that means is that if you want to find a row related to the performance of Store 1 in Q1, you would have to provide the combination of both values as the unique identifier of a row of data.

In [27]:
array_2d_sales = np.array([[100, 200], [150, 250], [200, 300]])
index = pd.MultiIndex.from_product([['Store 1', 'Store 2', 'Store 3'], ['Q1', 'Q2']])
df_2d_sales = pd.DataFrame(array_2d_sales, index=index[:3], columns=['Product A', 'Product B'])
df_2d_sales

Unnamed: 0,Unnamed: 1,Product A,Product B
Store 1,Q1,100,200
Store 1,Q2,150,250
Store 2,Q1,200,300


### Example 8: Converting a 1D NumPy array to a Pandas DataFrame with DatetimeIndex

A DatetimeIndex in Pandas is a type of index specifically designed for handling time-series data. It allows you to index your data using datetime objects, which makes it easier to perform date-related operations and analysis. Here are some key features and uses of DatetimeIndex:

* **Time-Based Indexing**: You can easily select and slice data based on dates. For example, you can retrieve data for a specific day, month, or year.
* **Resampling**: You can resample your time series data to different frequencies, such as converting daily data to monthly data, or vice versa.
* **Time Series Analysis**: Functions like rolling, expanding, and time-based groupby operations become more intuitive and powerful with DatetimeIndex.
* **Handling Missing Data**: DatetimeIndex makes it easier to identify and handle missing dates in your time series data.

In [28]:
dates = pd.date_range('2023-01-01', periods=5)
array_1d_stock_prices = np.array([100, 102, 101, 105, 110])
df_1d_stock_prices = pd.DataFrame(array_1d_stock_prices, index=dates, columns=['Stock Price'])
df_1d_stock_prices

Unnamed: 0,Stock Price
2023-01-01,100
2023-01-02,102
2023-01-03,101
2023-01-04,105
2023-01-05,110


### Example 8: Converting a 2D NumPy array to a Pandas DataFrame with custom index

Custom indexes in Pandas allow you to define your own labels for the rows of a DataFrame or Series instead of using the default numerical index. This can make your data easier to understand and work with, especially when the labels have meaningful significance in the context of your analysis.

In [30]:
array_2d_population = np.array([[1000, 2000], [1500, 2500], [1200, 2200]])
df_2d_population = pd.DataFrame(array_2d_population, index=['City A', 'City B', 'City C'], columns=['2019', '2020'])
df_2d_population


Unnamed: 0,2019,2020
City A,1000,2000
City B,1500,2500
City C,1200,2200


### Example 9: Converting a 1D NumPy array to a Pandas DataFrame with custom index and column names

In [31]:
array_1d_sports = np.array([20, 15, 25])
df_1d_sports = pd.DataFrame(array_1d_sports, index=['Player 1', 'Player 2', 'Player 3'], columns=['Goals'])
df_1d_sports

Unnamed: 0,Goals
Player 1,20
Player 2,15
Player 3,25
