# MORE ON NUMPY

## COMPARISONS BETWEEN ARRAYS

In NumPy, you can perform various types of comparisons between arrays, such as element-wise comparisons, scalar comparisons, and more complex logical operations. Here are a few examples of comparisons using NumPy

## 1) Element-wise Comparison
You can compare two NumPy arrays element by element using comparison operators (==, !=, <, >, <=, >=).  

### Example: Element-wise equality comparison

In [None]:
import numpy as np
# Create two arrays
a = np.array([1, 2, 3, 4])
b = np.array([1, 3, 3, 5])

# Element-wise comparison
result = a == b
print(result)  


Here, a == b compares each corresponding element of the arrays a and b. It returns an array of True or False based on whether the elements are equal.

### Example: Element-wise greater than comparison

In [None]:
result = a > b
print(result) 


This compares each element of a with the corresponding element of b and returns True if the element in a is greater, otherwise False.

## 2) Scalar Comparison with an Array
You can compare a scalar value with every element in an array.

### Example: Comparison with a scalar

In [None]:
result = a > 2
print(result) 


## 3) Logical Operations on Arrays
NumPy also allows you to combine multiple comparisons using logical operators (&, |, ~, etc.).

### Example: Combining comparisons with logical AND (&)

In [None]:
result = (a > 2) & (b < 5)
print(result)  


Here, the expression (a > 2) and (b < 5) are combined with the logical AND operator &. The result is True only if both conditions are True for each element.

### Example: Logical OR (|)

In [None]:
result = (a < 3) | (b > 3)
print(result)  


## BOOLEAN MASKING

### Filtering on 1-dim array

In [None]:

# Create a 1D array of random integers between 1 and 50
np.random.seed(0)
random_array = np.random.randint(1, 51, size=20)
print(random_array)

In [None]:
compare = random_array <= 25
print(compare)

In [None]:
# Apply boolean masking to filter out all values greater than 25
filtered_array = random_array[compare]


In [None]:
# Print the original and filtered arrays
print("Original Array:", random_array)
print("Filtered Array (values <= 25):", filtered_array)

In [None]:
# we can use the short notation
filtered_array = random_array[random_array <= 4]
print(filtered_array)

### Filtering on matrices and tensors

Suppose we have the daily stock prices of 10 shares

In [None]:
np.random.seed(0)
# Simulate daily stock prices for 10 stocks over 30 days
# Assume stock prices range from $50 to $150
stock_prices = np.random.uniform(50, 150, (10, 30))
print(stock_prices)

In [None]:
# I want to extract prices between 90 and 120
filter2 = (stock_prices <= 90) & (stock_prices < 120)
filter2

In [None]:
# we can use the short notation
stock_prices_filtered = stock_prices[filter2]
print(stock_prices_filtered)

In [None]:
# take care, result is a 1-dim array
stock_prices_filtered.shape

### What if i want to preserve the dimension of the original matrix?
First, i create a new matrix to save the original matrix

In [None]:
new_stock_prices = stock_prices

In [None]:
new_stock_prices[(new_stock_prices <= 90) & (new_stock_prices < 120)] = 0
print(new_stock_prices)

## Let's have a look at the original matrix

In [None]:
stock_prices

We note that also the original has changed! WHY???

Let's see how to fix it

In [None]:
stock_prices = np.random.uniform(50, 150, (10, 30))

new_stock_prices = stock_prices.copy()

In [None]:
new_stock_prices[(new_stock_prices <= 90) & (new_stock_prices < 120)] = 0
new_stock_prices

In [None]:
stock_prices

## Exercise: Stock Price Analysis
Scenario: You have daily stock prices of 10 stocks over 30 days in a 2D NumPy array. You need to analyze the data based on certain criteria.

## Task:

- Find all the stock prices in the dataset that are greater than 120.  
- Replace all stock prices below 80 with the value 80 (to simulate a price floor).  
- Count how many prices were adjusted to 80.  
Starter Code:

In [None]:

# Simulate stock prices: 10 stocks over 30 days, prices range from $50 to $150
np.random.seed(0)
stock_prices = np.random.uniform(50, 150, (10, 30))

# 1. Find all prices greater than $120
# 2. Replace all prices below $80 with $80
# Your code here
# 3. Count how many prices were adjusted to $80
# Your code here

## Solutions

In [None]:
# Print the original stock prices
print("Original stock prices:")
print(stock_prices)

# 1. Find all prices greater than $120
high_prices = stock_prices[stock_prices > 120]
print("\nPrices greater than $120:")
print(high_prices)

# 2. Replace all prices below $80 with $80
stock_prices[stock_prices < 80] = 80
print("\nStock prices after applying the price floor of $80:")
print(stock_prices)

# 3. Count how many prices were adjusted to $80
adjusted_count = np.sum(stock_prices == 80)
print(f"\nNumber of prices adjusted to $80: {adjusted_count}")

# DATA MANIPULATION WITH PANDAS
Pandas is a library, based on NumPy, that provides a new object type, the **DataFrame**.  
DataFrames are essentially multidimensional matrices which however have **"labels" on the rows and columns** and can **host heterogeneous types** (numpy ndarrays can host data of the same type) and/or missing data.   
DataFrames are therefore convenient for managing data.

Let's now import the Pandas library

In [None]:
import pandas as pd

## Loading external data
Let's load a file that contains sales data. It is in CSV format which represents one of the simplest ways to represent data in tabular form within a simple text file.

In [None]:
# Loading file CSV
url = 'https://raw.githubusercontent.com/pal-dev-labs/Python-for-Economic-Applications-2024-2025/refs/heads/main/Data/sales_data.csv'
df = pd.read_csv(url)

# for the moment just run this command
df.index = [f"Car_{i}" for i in range(1, len(df) + 1)]

# pd.read

In [None]:
type(df)

In [None]:
# let's have a look
df

## Overall view of the table
Take care of Dtype **object**

In [None]:
df.info()

## Accessing dataframe data

The DataFrame has an index attribute that gives **access to the ROWS**

In [None]:
df.index

The DataFrame has a columns attribute that gives **access to COLUMNS**

In [None]:
df.columns


While with Numpy we cannot add "Labels" to rows and columns, with Pandas it is possible.


We can use index to access the rows

In [None]:
# slicing by EXPLICIT index
df['Car_10':'Car_15']

In [None]:
# slicing by IMPLICIT index
df[3:5]

In [None]:
# Let's try to extract a COLUMN
a = df['Manufacturer']   # in this case I access the column
print(a)

Note a column is of type **Series**. A DataFrame is composed by many smaller objects (a column) of type **Series** 

In [None]:
type(a)

In [None]:
# Let's try to extract two COLUMNS
# we use a list inside the first []

df[['Manufacturer','Sales_in_thousands']]

In [None]:
# I can also combine (columns with implicit index)
df['Price_in_thousands'][1:9]

### How can we access a single element of the DataFrame?  
As with ndarrays we can use the [x,y] notation, using the **iloc method**  
Remember that the numbering starts from [0,0]

In [None]:
df.iloc[1,1]

We can also use slicing notation

In [None]:
# extract second row
df.iloc[2,:]

In [None]:
# we extract the ninth column
df.iloc[:,9]

In [None]:
# we extract from the 3rd to the 7th row and from the 9th to the 10th column
df.iloc[2:6,8:10]

It's also possible to specify specific positions inserting lists as parameters

In [None]:
df.iloc[[2,5,7],[3,4,9]]

The dataframe columns are also FIELDS of the dataframe object.
We can then use the df.fields command

In [None]:
df.Price_in_thousands

## NOW YOU TRY IT
- Extract the rows from 20 to 40 from the table
- Extract the 'Engine_size' and 'Horsepower' columns from the table
- Extract rows from 15 to 45 of the 'Price_in_thousands' and 'Fuel_capacity' columns
- Extract all rows from the 4th column to the end
- extracts all the information corresponding to line 56

## Filters to extract information from the table

In [None]:
df.columns

In [None]:
# Let's try to take just one column
year = df['Price_in_thousands']

In [None]:
year

### I want to extract only ten rows

In [None]:
# we extract ten LINES
b = year.iloc[10:20]
print(b)

### I want to check which rows have the value 39.895  
Let's use boolean comparison

In [None]:
b == 39.895

### I notice that the check returns TRUE on each line in which the condition is verified.

### I can also use expressions of logical operators like (20 < b) & (b < 35)
With dataframe -
- **&** is the bitwise operator **AND**
- **|** is the bitwise operator **or**

In [None]:
b 

In [None]:
(20 < b) & (b < 35)

### The interesting thing is that if I apply that filter to the entire dataframe, we can filter out the unwanted elements

In [None]:
year = df['Price_in_thousands']
filter1 = (20 < year) & (year < 35)

In [None]:
filter1

In [None]:
df_filtered = df[filter1]
df_filtered

### As always it's possible to use a more compact syntax

In [None]:
df_filtered = df[df['Price_in_thousands']==(20 < year) & (year < 35)]
df_filtered

In [None]:
df_filtered = df[(df['Price_in_thousands']>20)&(df['Price_in_thousands']<35)]
df_filtered

In [None]:
#  OR operator
df_filtered = df[(df['Year']==1677) | (df['Year']==1679)]
df_filtered

# NOW YOU TRY IT
- Extract the rows with Engine_size == 1.8
- Extract the rows with the Sales_in_thousands between 35 and 45
- Extract the rows that have a number of Price_in_thousands greater than 40 and Social Advert greater than 50000. Print only the column ['Manufacturer','Model','Price_in_thousands','Social Advert']

In [None]:
#df1 = df[df['Engine_size']==1.8]
#df1

In [None]:
#df2 = df[(df['Sales_in_thousands']>35)&(df['Sales_in_thousands']<45)]
#df2

In [None]:
#df3 = df[(df['Price_in_thousands']>40)&(df['Social Advert']>50000)][['Manufacturer','Model','Price_in_thousands','Social Advert']]
#df3


## NaN: Not A Number

The NaN value is used in Pandas to represent a missing value.
There are several methods, among which we report:

- ``isnull()``: creates a boolean mask highlighting the NaNs with True
- ``notnull()``: the opposite of ``isnull()``
- ``dropna()``: returns a filtered version of the dataframe without NaN
- ``fillna()``: returns a copy of the dataframe with the NaN data replaced by other values

In [None]:
# Loading file CSV
url = 'https://raw.githubusercontent.com/pal-dev-labs/Python-for-Economic-Applications-2024-2025/refs/heads/main/Data/earthquakes.csv'
df = pd.read_csv(url)

In [None]:
df

In [None]:
df.info()

In [None]:
df.isnull()

In [None]:
df.notnull()

In [None]:
df.fillna("0")

## Now we can eliminate the NaNs by substituting a more tractable value. We put the string "0"

In [None]:
newdf = df.fillna("0")

In [None]:
newdf

## Checking for NaN values with Numpy array (ndarray)
You can check if elements are NaN (Not a Number) using np.isnan().

### Example: Checking for NaN values

In [None]:
c = np.array([1.0, np.nan, 2.0, np.nan])
print(c)

In [None]:
result = np.isnan(c)
print(result)  


In [None]:
# Import data and clean up the index
data_url = "https://github.com/QuantEcon/lecture-python-intro/raw/main/lectures/datasets/longprices.xls"
df_fig5 = pd.read_excel(data_url, 
                        sheet_name='all', 
                        header=2, 
                        index_col=0).iloc[1:]
df_fig5.index = df_fig5.index.astype(int)
df_fig5

In [None]:
!pip install xlrd

In [None]:
# Import required libraries
import yfinance as yf
import matplotlib.pyplot as plt



In [None]:
# if you're working locally install the yfinance Module
# uncomment the below line and run the cell
# !pip install yfinance

In [None]:
# Define a list of 10 stock symbols (e.g., tech and other industries)
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA', 'META', 'NVDA', 'NFLX', 'ADBE', 'ORCL']

# Define the time range for data
start_date = '2022-01-01'
end_date = '2023-01-01'

# Fetch stock data using yfinance
data = yf.download(tickers, start=start_date, end=end_date)['Adj Close']


In [None]:
data

In [None]:
data.index

In [None]:
type(data.index)

In [None]:
# Display the first few rows of the dataframe
print("First few rows of the stock data:")
print(data.head())

We can extract a time window applying slice notation

In [None]:
# Filter data between May and July (inclusive) of 2022
window= data['2022-05-01':'2022-07-31']
window

we can extract information for one stock

In [None]:
aapl_data = data['AAPL']

In [None]:
# Calculate daily returns (percentage change)
apple_daily_returns = aapl_data.pct_change() * 100  # Convert to percentage
apple_daily_returns

### Next Steps: plot data

In [None]:
# Plot the closing prices
plt.figure(figsize=(14, 7))
for ticker in tickers:
    plt.plot(data.index, data[ticker], label=ticker)

plt.title("Stock Prices Over Time")
plt.xlabel("Date")
plt.ylabel("Adjusted Close Price")
plt.legend(loc="upper left", fontsize=10)
plt.grid()
plt.show()
