
### Pandas . in Algorithmic Trading

**Pandas** is a powerful data manipulation and analysis library for Python. 
- It provides data structures like Series and DataFrame, 
- which are essential for handling and analyzing time series data, such as stock prices, in algorithmic trading.

#### Key Features of Pandas

1. **Data Structures**: Provides Series (1D) and DataFrame (2D) for structured data.
2. **Time Series Analysis**: In-built support for handling and manipulating time series data.
3. **Data Cleaning**: Tools for handling missing data, filtering, and aggregating data.
4. **Data Alignment**: Automatically aligns data based on labels, making operations intuitive.
5. **Integration**: Easily integrates with other libraries like NumPy, Matplotlib, and SciPy.
6. **IO Tools**: Functions for reading and writing data from various formats (CSV, Excel, SQL, etc.).

#### Use Cases in Algorithmic Trading

1. **Loading and Cleaning Data**: Importing historical price data from CSV, cleaning missing values, and normalizing data.
2. **Time Series Analysis**: Calculating moving averages, returns, and other indicators.
3. **Data Aggregation**: Grouping data by time intervals (e.g., daily, weekly) to perform aggregate calculations.
4. **Feature Engineering**: Creating new features for machine learning models, such as technical indicators.
5. **Backtesting**: Simulating trading strategies on historical data to evaluate performance.
6. **Visualization**: Plotting price trends, indicators, and performance metrics using integrated plotting functions.

### Comparison: Pandas vs. NumPy

| Feature/Aspect          | Pandas                                    | NumPy                                      |
|-------------------------|-------------------------------------------|--------------------------------------------|
| **Primary Data Structures** | Series (1D), DataFrame (2D)              | ndarray (n-dimensional array)               |
| **Use Case Focus**      | Data manipulation and analysis            | Numerical computation and operations       |
| **Time Series Handling**| Excellent support for time series         | Basic support (requires additional libraries)| 
| **Data Cleaning**       | Built-in functions for handling missing data | Requires custom handling                    |
| **Data Alignment**      | Automatic alignment based on labels       | No alignment, purely positional             |
| **Data Types**          | Can handle heterogeneous data types       | Homogeneous data types only                 |
| **Performance**         | Slower due to additional features         | Faster for numerical operations             |
| **Aggregation & Grouping**| Robust groupby and aggregation methods    | Limited grouping capabilities               |
| **Integration**         | Integrates well with other data tools     | Often serves as the computational backbone  |
| **Visualization**       | Integrated plotting (via Matplotlib)      | Requires external plotting libraries        |
| **Memory Efficiency**   | Higher memory usage due to metadata       | More memory-efficient for raw numerical data|

### Example in Algorithmic Trading

#### Using Pandas

```python
import pandas as pd

# Load historical price data from CSV
data = pd.read_csv('historical_prices.csv', parse_dates=['Date'], index_col='Date')

# Calculate moving averages
data['MA50'] = data['Close'].rolling(window=50).mean()
data['MA200'] = data['Close'].rolling(window=200).mean()

# Calculate daily returns
data['Daily_Return'] = data['Close'].pct_change()

# Plot closing prices and moving averages
data[['Close', 'MA50', 'MA200']].plot()
```

#### Using NumPy

```python
import numpy as np

# Assume `prices` is a NumPy array of closing prices
prices = np.array([100, 101, 102, 103, 104])

# Calculate moving averages using NumPy
MA50 = np.convolve(prices, np.ones(50)/50, mode='valid')
MA200 = np.convolve(prices, np.ones(200)/200, mode='valid')

# Calculate daily returns
daily_returns = np.diff(prices) / prices[:-1]
```

### Summary

- **Pandas**: Ideal for data manipulation, time series analysis, and handling heterogeneous data. It's especially useful in algorithmic trading for loading, cleaning, and analyzing large datasets of historical price data. Pandas is slower but offers a wide range of high-level data manipulation tools.
- **NumPy**: Best for efficient numerical computations with homogeneous data. It is faster and more memory-efficient but requires more manual handling for data manipulation tasks.

In algorithmic trading, both libraries are often used together: NumPy for efficient numerical operations and Pandas for data manipulation and analysis tasks.

### What is Pandas

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

https://pandas.pydata.org/about/index.html

### Pandas Series

A Pandas Series is like a column in a table. It is a 1-D array holding data of any type.

### Importing Pandas

In [1]:
import numpy as np
import pandas as pd

### Series from lists

In [4]:
# string
stocks = ["Apple", "Microsoft", "Bitcoin","Ethereum","Apple"]
stock_obj  = pd.Series(stocks)
print(stock_obj)

0        Apple
1    Microsoft
2      Bitcoin
3     Ethereum
4        Apple
dtype: object


In [5]:
# integers
prices =[111,222,333,444,555]
pd.Series(prices)

0    111
1    222
2    333
3    444
4    555
dtype: int64

In [8]:
# custom index
stocks = ["Apple", "Microsoft", "Bitcoin","Ethereum","Ripple", "Litecoin"]
prices =[111,222,333,444,555,1 ]

data  = pd.Series(prices,index=stocks)
data

Apple        111
Microsoft    222
Bitcoin      333
Ethereum     444
Ripple       555
Litecoin       1
dtype: int64

In [9]:
# setting a name
data  = pd.Series(prices,index=stocks,name="Kuldeep's Portfolio")
data

Apple        111
Microsoft    222
Bitcoin      333
Ethereum     444
Ripple       555
Litecoin       1
Name: Kuldeep's Portfolio, dtype: int64

### Series from dict

In [10]:
price_dict = {"Apple":111, "Microsoft":222, "Bitcoin":333,"Ethereum":444,"Ripple":555,"Apple":1111}
data_prices = pd.Series(price_dict,name="Kuldeep's Price")
data_prices

Apple        1111
Microsoft     222
Bitcoin       333
Ethereum      444
Ripple        555
Name: Kuldeep's Price, dtype: int64

### Series Attributes

In [11]:
# size
data_prices.size

5

In [12]:
# dtype
data_prices.dtype

dtype('int64')

In [13]:
# name
data_prices.name

"Kuldeep's Price"

In [14]:
# is_unique
stock_obj.is_unique

False

In [15]:
# index
data_prices.index

Index(['Apple', 'Microsoft', 'Bitcoin', 'Ethereum', 'Ripple'], dtype='object')

In [16]:
# values
data_prices.values

array([1111,  222,  333,  444,  555], dtype=int64)

### Series using read_csv

In [27]:
# with one col
df = pd.read_csv( "coin_Bitcoin.csv")
df["Name"]

0       Bitcoin
1       Bitcoin
2       Bitcoin
3       Bitcoin
4       Bitcoin
         ...   
2986    Bitcoin
2987    Bitcoin
2988    Bitcoin
2989    Bitcoin
2990    Bitcoin
Name: Name, Length: 2991, dtype: object

In [None]:
# with 2 cols


### Series methods

In [None]:
# head and tail


In [None]:
# sample


In [None]:
# value_counts -> prices


In [None]:
# sort_values -> inplace


In [None]:
# sort_index -> inplace -> movies


### Series Maths Methods

In [None]:
# count


In [None]:
# sum -> product


In [None]:
# mean -> median -> mode -> std -> var


In [None]:
# min/max


In [None]:
# describe


### Series Indexing

In [None]:
# integer indexing


In [None]:
# negative indexing


In [None]:
# slicing


In [None]:
# negative slicing


In [None]:
# fancy indexing


In [None]:
# indexing with labels -> fancy indexing


### Editing Series

In [None]:
# using indexing


In [None]:
# what if an index does not exist


In [None]:
# slicing


In [None]:
# fancy indexing


In [None]:
# using index label


### Copy and Views

### Series with Python Functionalities

In [None]:
# len/type/dir/sorted/max/min


In [None]:
# type conversion


In [None]:
# membership operator



In [None]:
# looping


In [None]:
# Arithmetic Operators(Broadcasting)


In [None]:
# Relational Operators



### Boolean Indexing on Series

In [None]:
# Find no of 50's and 100's scored by kohli


In [None]:
# find number of ducks


In [None]:
# Count number of day when I had more than 200 subs a day


In [None]:
# find actors who have done more than 20 movies


### Plotting Graphs on Series

### Some Important Series Methods

In [None]:
# astype
# between
# clip
# drop_duplicates
# isnull
# dropna
# fillna
# isin
# apply
# copy

In [None]:
import numpy as np
import pandas as pd

In [None]:
# astype


In [None]:
# between


In [None]:
# clip


In [None]:
# drop_duplicates


In [None]:
# isnull


In [None]:
# dropna


In [None]:
# fillna


In [None]:
# isin


In [None]:
# apply


In [None]:
# copy