# NumPy & Pandas Mini-Challenges + Capstone (Portfolio Notebook)

This notebook showcases hands-on practice with **NumPy** and **pandas** through a set of mini-challenges and a small capstone project.

## What this demonstrates
- Creating and manipulating **NumPy arrays** (1D/2D), indexing/slicing, boolean masking
- Common **pandas** workflows: DataFrame creation, filtering, aggregation, concatenation
- Reading tabular data from **HTML** with `pd.read_html`
- A capstone exercise building a simple **bank client** dataset and extending it with additional records

## Setup

In [1]:
import numpy as np
import pandas as pd

## Capstone project — Bank client dataset

### Goal
Build a simple client dataset:
1. Define two DataFrames for clients (IDs 1–5 and 6–10)
2. Add **Annual Salary** for each client
3. Concatenate the datasets
4. Add a new client (ID 11) and append to the combined DataFrame

In [16]:
# Creating a DataFrame for bank clients
Bank_df_1 = pd.DataFrame({'Bank Client ID': [1, 2, 3, 4, 5],
                         'First Name': ['Maya', 'Noah', 'Emma', 'Liam', 'Sofia'],
                         'Last Name': ['Vermeer', 'de Vries', 'Jansen', 'Bakker', 'Visser']},
                        )
Bank_df_1


Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Maya,Vermeer
1,2,Noah,de Vries
2,3,Emma,Jansen
3,4,Liam,Bakker
4,5,Sofia,Visser


In [17]:
# Creating another DataFrame for bank clients
Bank_df_2 = pd.DataFrame({'Bank Client ID': [6, 7, 8, 9, 10],
                         'First Name': ['Lucas', 'Mila', 'Ethan', 'Ava', 'Oliver'],
                         'Last Name': ['Smit', 'van Dijk', 'Meijer', 'Peters', 'Mulder']})
Bank_df_2


Unnamed: 0,Bank Client ID,First Name,Last Name
0,6,Lucas,Smit
1,7,Mila,van Dijk
2,8,Ethan,Meijer
3,9,Ava,Peters
4,10,Oliver,Mulder


In [19]:
# Adding an 'Annual Salary' column to Bank_df_1
Bank_df_1["Annual Salary"] = [62500, 48200, 91300, 73900, 55100]   # IDs 1–5
print(Bank_df_1)

# Adding an 'Annual Salary' column to Bank_df_2
Bank_df_2["Annual Salary"] = [120000, 66750, 84400, 39950, 102600] # IDs 6–10
print(Bank_df_2)

   Bank Client ID First Name Last Name  Annual Salary
0               1       Maya   Vermeer          62500
1               2       Noah  de Vries          48200
2               3       Emma    Jansen          91300
3               4       Liam    Bakker          73900
4               5      Sofia    Visser          55100
   Bank Client ID First Name Last Name  Annual Salary
0               6      Lucas      Smit         120000
1               7       Mila  van Dijk          66750
2               8      Ethan    Meijer          84400
3               9        Ava    Peters          39950
4              10     Oliver    Mulder         102600


In [None]:

# Concatenating the two DataFrames
bank_df_all = pd.concat([Bank_df_1, Bank_df_2])

# Resetting the index of the concatenated DataFrame
bank_df_all.reset_index(drop=True, inplace=True)
print(bank_df_all)


   Bank Client ID First Name Last Name  Annual Salary
0               1       Maya   Vermeer          62500
1               2       Noah  de Vries          48200
2               3       Emma    Jansen          91300
3               4       Liam    Bakker          73900
4               5      Sofia    Visser          55100
5               6      Lucas      Smit         120000
6               7       Mila  van Dijk          66750
7               8      Ethan    Meijer          84400
8               9        Ava    Peters          39950
9              10     Oliver    Mulder         102600


In [24]:
# Creating a DataFrame for a new bank client
new_client = pd.DataFrame({'Bank Client ID': [11],
                          'First Name': ['Zoe'],
                          'Last Name': ['van Leeuwen'],
                          'Annual Salary' : [58300]})
print(new_client)

# Adding the new client to the existing DataFrame
all_clients = pd.concat([bank_df_all, new_client])
all_clients.reset_index(drop=True, inplace=True)


   Bank Client ID First Name    Last Name  Annual Salary
0              11        Zoe  van Leeuwen          58300


In [25]:
print(all_clients)

    Bank Client ID First Name    Last Name  Annual Salary
0                1       Maya      Vermeer          62500
1                2       Noah     de Vries          48200
2                3       Emma       Jansen          91300
3                4       Liam       Bakker          73900
4                5      Sofia       Visser          55100
5                6      Lucas         Smit         120000
6                7       Mila     van Dijk          66750
7                8      Ethan       Meijer          84400
8                9        Ava       Peters          39950
9               10     Oliver       Mulder         102600
10              11        Zoe  van Leeuwen          58300


### Quick checks
These are common lightweight checks in real analysis workflows:
- row counts
- missing values
- basic summary stats

In [26]:
# Checking the shape and missing values in the final DataFrame
all_clients.shape, all_clients.isna().sum()


((11, 4),
 Bank Client ID    0
 First Name        0
 Last Name         0
 Annual Salary     0
 dtype: int64)

In [None]:
# Descriptive statistics for the 'Annual Salary' column
all_clients["Annual Salary"].describe()

count        11.000000
mean      73000.000000
std       24380.268661
min       39950.000000
25%       56700.000000
50%       66750.000000
75%       87850.000000
max      120000.000000
Name: Annual Salary, dtype: float64

In [None]:
# Calculating the mean annual salary
all_clients["Annual Salary"].mean()

np.float64(73000.0)

## NumPy mini-challenges

### Mini-challenge 1 — Create a 2×4 array
Create the following 2×4 NumPy array:

[[3 7 9 3] \
[4 3 2 2]]

In [2]:
# Creating a 2D NumPy array (matrix)
my_matrix = np.array([[3, 7, 9, 3], [4, 3, 2, 2]])
my_matrix

array([[3, 7, 9, 3],
       [4, 3, 2, 2]])

### Mini-challenge 2 — Random 1×10 array from 0 to x
Takes a positive integer `x` and creates a 1×10 array with random integers ranging from 0 to `x`.

> If you want this notebook to run non-interactively, replace the `input(...)` line with a fixed value (e.g., `x = 10`).

In [None]:
# Generating a random NumPy array based on user input
x = int(input('Enter a positive number: '))

# Generating a random array of 10 integers between 0 and x-1
matrix = np.random.randint(0, x, 10)

print(matrix)

[65 56 18 26  6 27 12 19 61 35]


### Mini-challenge 3 — Distance between two 3D points
Given:

X = [5, 7, 20] \
Y = [9, 15, 4]


Compute the distance (as attempted below).

In [5]:
# Calculating the Euclidean distance between two NumPy arrays
x = np.array([5, 7, 20])
y = np.array([9, 15, 4])

# Euclidean distance calculation
distance = np.linalg.norm(x - y)
distance

np.float64(18.33030277982336)

### Mini-challenge 4 — Replace the last row with zeros
In the matrix below, replace the last row with 0.

X = [ 2 30 20 -2 -4] \
    [ 3 4 40 -3 -2] \
    [-3 4 -6 90 10] \
    [25 45 34 22 12] \
    [13 24 22 32 37]

### Mini-challenge 5 — Replace negatives and odds
- Replace negative elements with `0`
- Replace odd elements with `-2`

In [None]:
# Modifying elements of a NumPy array based on conditions
X = np.array([[2, 30, 20, -2, -4], 
              [3, 4,  40, -3, -2],
              [-3, 4, -6, 90, 10],
              [25, 45, 34, 22, 12],
              [13, 24, 22, 32, 37]])

# Setting negative values to 0 and odd values to -2
X[X < 0] = 0
X[X % 2 == 1] = -2

X

array([[ 2, 30, 20,  0,  0],
       [-2,  4, 40,  0,  0],
       [ 0,  4,  0, 90, 10],
       [-2, -2, 34, 22, 12],
       [-2, 24, 22, 32, -2]])

## Pandas mini-challenges

### Mini-challenge 6 — Build a simple stock portfolio table
Create a DataFrame with:
- stock ticker symbols
- number of shares
- price per share  
Then compute the total portfolio value.

In [7]:
# Calculating the total dollar value of stocks in a portfolio
portfolio_df = pd.DataFrame({'Stock Symbol': ['AAPL', 'MSFT', 'NVDA'],
                            'Number of Shares': [12, 5, 20],
                            'Price per Share(USD)': [259.96, 459.38, 183.14]})

print(portfolio_df)

# Calculating the dollar value for each stock
stocks_dollar_value = portfolio_df['Number of Shares'] * portfolio_df['Price per Share(USD)']

# Calculating the total dollar value of the portfolio
total_portfolio_value = stocks_dollar_value.sum()

print(f'Total Portfolio Value: ${total_portfolio_value:.2f}')



  Stock Symbol  Number of Shares  Price per Share(USD)
0         AAPL                12                259.96
1         MSFT                 5                459.38
2         NVDA                20                183.14
Total Portfolio Value: $9079.22


### Mini-challenge 7 — Read tabular data from HTML
Use `pd.read_html` to read tables from a webpage.

Example used below:
- Canadian house price tables (LivingIn-Canada)
- FDIC failed bank list (FDIC)

> Some sites (like SSA) may block automated requests; that’s a site policy rather than a pandas issue.

In [8]:
# Scraping house price data from a webpage
house_price_df = pd.read_html('https://www.livingin-canada.com/house-prices-canada.html')

# Accessing the first two tables from the list
house_price_df[0]

house_price_df[1]


Unnamed: 0,Province,Average House Price,12 Month Change
0,British Columbia,"$736,000",+ 7.6 %
1,Ontario,"$594,000",– 3.2 %
2,Alberta,"$353,000",– 7.5 %
3,Quebec,"$340,000",+ 7.6 %
4,Manitoba,"$295,000",– 1.4 %
5,Saskatchewan,"$271,000",– 3.8 %
6,Nova Scotia,"$266,000",+ 3.5 %
7,Prince Edward Island,"$243,000",+ 3.0 %
8,Newfoundland / Labrador,"$236,000",– 1.6 %
9,New Brunswick,"$183,000",– 2.2 %


In [9]:
# Scraping failed bank data from the FDIC website
failed_banks_df = pd.read_html('https://www.fdic.gov/bank/individual/failed/banklist.html'
            )

# Accessing the first table from the list
failed_banks_df[0]



Unnamed: 0,Bank Name,City,State,Cert,Acquiring Institution,Closing Date,Fund Sort ascending
0,The Santa Anna National Bank,Santa Anna,Texas,5520,Coleman County State Bank,"June 27, 2025",10549
1,Pulaski Savings Bank,Chicago,Illinois,28611,Millennium Bank,"January 17, 2025",10548
2,The First National Bank of Lindsay,Lindsay,Oklahoma,4134,First Bank & Trust Co.,"October 18, 2024",10547
3,Republic First Bank dba Republic Bank,Philadelphia,Pennsylvania,27332,"Fulton Bank, National Association","April 26, 2024",10546
4,Citizens Bank,Sac City,Iowa,8758,Iowa Trust & Savings Bank,"November 3, 2023",10545
5,Heartland Tri-State Bank,Elkhart,Kansas,25851,"Dream First Bank, N.A.","July 28, 2023",10544
6,First Republic Bank,San Francisco,California,59017,"JPMorgan Chase Bank, N.A.","May 1, 2023",10543
7,Signature Bank,New York,New York,57053,"Flagstar Bank, N.A.","March 12, 2023",10540
8,Silicon Valley Bank,Santa Clara,California,24735,First Citizens Bank & Trust Company,"March 10, 2023",10539
9,Almena State Bank,Almena,Kansas,15426,Equity Bank,"October 23, 2020",10538


### Mini-challenge 8 — Filter + aggregate
Select high-net-worth clients (≥ 5000) and compute combined net worth.

In [10]:
# Creating a DataFrame for bank clients
bank_client_df = pd.DataFrame({'Bank Client ID':[111, 222, 333, 444],
                              'Bank Client Name':['Chanel', 'Steve', 'Mitch', 'Ryan'],
                              'Net Worth[$]': [3500, 29000, 10000, 2000],
                              'Years with bank': [3, 4, 9, 5]})
bank_client_df

Unnamed: 0,Bank Client ID,Bank Client Name,Net Worth[$],Years with bank
0,111,Chanel,3500,3
1,222,Steve,29000,4
2,333,Mitch,10000,9
3,444,Ryan,2000,5


In [13]:
# Filtering clients who have been with the bank for 5 or more years
df_loyal = bank_client_df[bank_client_df['Years with bank'] >=5 ]
df_loyal

# Calculating the total net worth of clients with net worth >= $5000
high_net_worth_df = bank_client_df[bank_client_df['Net Worth[$]'] >=5000]
total_net_worth = high_net_worth_df['Net Worth[$]'].sum()
print('Total net worth of high net worth clients:', total_net_worth)

Total net worth of high net worth clients: 39000


### Mini-challenge 9 — Apply a function and compute totals
Define a function that triples values and adds 200, apply it, then compute updated total.

In [14]:
# Creating a DataFrame for bank clients
bank_client_df = pd.DataFrame({'Bank client ID':[111, 222, 333, 444], 
                               'Bank Client Name':['Chanel', 'Steve', 'Mitch', 'Ryan'], 
                               'Net worth [$]':[3500, 29000, 10000, 2000], 
                               'Years with bank':[3, 4, 9, 5]})
bank_client_df

Unnamed: 0,Bank client ID,Bank Client Name,Net worth [$],Years with bank
0,111,Chanel,3500,3
1,222,Steve,29000,4
2,333,Mitch,10000,9
3,444,Ryan,2000,5


In [15]:
# Defining a function to triple the stock price and add $200
def triple_stock(price):
    return price * 3 + 200

# Applying the function to the 'Net worth [$]' column
updated_stocks = bank_client_df['Net worth [$]'].apply(triple_stock)
print(updated_stocks)

print('The updated total networth of all clients combined is: ' + str(updated_stocks.sum())) 


0    10700
1    87200
2    30200
3     6200
Name: Net worth [$], dtype: int64
The updated total networth of all clients combined is: 134300
