
### Question 1.1
The below 'add_products' function is acting weird. The number of element contained by the portfolio doesn't match what we would expect and the invested time neither.
Please fix the code and explain the reason of the strange behaviour.

In [None]:
import time

### modify the function to fix the issues you find
def add_products(account_name, products, portfolio=[], invested_at=time.time()):
    portfolio.extend(products)
    time.sleep(3)
    print(f"""---- Added {len(products)} to the {account_name}'s portfolio. ----
    The {account_name}'s portfolio contains now {len(portfolio)} different products.
    Registering the order took {round(time.time()-invested_at, 3)} seconds\n""")

    return portfolio
    

In [None]:
from dataclasses import dataclass
# Sample data
@dataclass
class Product:
    name: str
    amount: int
    price: float
    category: str
    
existing_interviewer_portfolio= [
    Product('SPY', 2, 5.56, 'ETF'),
    Product('VOO', -1, 25.21, 'ETF'),
    Product('MSFT', 13, 79.56, 'Equity'),
    Product('AAPL', -10, -23.5, 'Equity'),
    Product('AMZN', 4, -274.5, 'Equity'),
]

In [None]:
# Portfolio 1
pf_1 = add_products(
    "Demo account",
    products=[
        Product("SPY", -25, 5.56, "ETF"),
        Product("MSFT", 13, 3.76, "Equity"),
        Product("FXAIX", -2, 16.24, "Fund"),
    ],
)

# Portfolio 2
pf_2 = add_products(
    "Manager account", products=[Product("ZL=F", 34, -23.5, "Future")]
)

# Portfolio 3
pf_3 = add_products(
    "Interviewer account",
    products=[
        Product("ETH-USD", 10, 1256.56, "CCC"),
        Product("BTC-USD", 1, 15_000.56, "CCC"),
        Product("AAPL", 12, 143.43, "Equity"),
    ],
    portfolio=existing_interviewer_portfolio,
)

print("portfolio 1", pf_1)
print("portfolio 2", pf_2)
print("portfolio 3", pf_3)

### Question 1.2
How would you find the *categories* present in *portfolio 1* that are not present in the *portfolio 3*? Propose a solution:

In [None]:
### Write your solution here

### Question 1.3:
we need to know which investments are *short* (have a negative amount) in our *portfolio 1* for the classes **Equity** and **ETF**. How would you achieve that?

In [None]:
### Write your solution here

### Question 1.4: (optional)
For a given portfolio, would you be able to provide a solution to count the number of *products* present on each *category*?

In [None]:
### Write your solution here

## Code Review

### Question 2.1
Look at the code below and express your opinion on its structure. What are your considerations? If you had to refactor it, which actions would you take?

In [None]:
from dataclasses import dataclass

ROLES = ['demo', 'default', 'admin']
        
@dataclass
class User:
    name: str
    role: str
    
    def get_records(self):
        if self.role=='admin':
            print( 'retrieving all the products available in the db')
        else:
            if self.role=='default':
                print(f'retrieving only the products in which {self.name} user is invested')
            elif self.role=='demo':
                print('retrieving only public available products')
            else:
                print(f"you don't have a valid role. The role should be one of [{', '.join(str(role) for role in ROLES)}]")


prospect = User('Richmond', 'demo')
client = User('Smart Guy', 'default')
it_guy = User('Maurice Moss', 'admin')
manager = User('Jen Barber', 'manager')


prospect.get_records()
client.get_records()
it_guy.get_records()
manager.get_records()

### Question 2.2
Look at the code below and express your opinion on its structure. What are your considerations? If you had to refactor it, which actions would you take? Has this the same issues as the above script?

Suppose we want our Portfolio to support in the future different data sources (e.g., Yahoo Finance, a CSV file, an API or a database). The user should choose the source at runtime, but the Portfolio class should not know the details of how to build the correct fetcher. How would you design this?

In [None]:
import yfinance as yf
import pandas as pd

class portfolio():
    
    def __init__(self, products= None, frequency='Monthly'):
        self.products = products or  ['0005.HK', '0006.HK', '0066.HK', '0700.HK', '2800.HK']
        self.df = pd.DataFrame()
        self.frequency = frequency
        print('initialized')

    
    def download_prices(self, benchmarks= None):
        benchmarks= benchmarks or []
        stock_list = self.products
        data = yf.download(stock_list, start="2019-06-01", end="2020-02-21")
        print('data fields downloaded:', set(data.columns.get_level_values(0)))
        self.df = data.Close
        return self.df
    
    def performances(self):
        frequency = self.frequency
        try: 
            if frequency=='Daily':
                print('computing Daily perfs')
                perf= self.df.pct_change()
            elif frequency=='Monthly':
                print('computing Monhtly perfs')
                perf=  self.df.sort_index().resample("M").apply(lambda x : x.iloc[-1]).pct_change()
            elif frequency == 'Weekly':
                print('computing Weekly perfs')
                perf=  self.df.sort_index().resample('W-MON', label='left', closed='left').apply(lambda x : x.iloc[-1]).pct_change()
            else:
                raise ValueError('the frequency requested is not available')
        except:
            print("couldn't compute preformances")
        return perf

my_portfolio= portfolio(['GOLD', 'SI=F','AMZN','MSFT','AAPL'] )
my_portfolio.download_prices()
my_portfolio.df.drop('SI=F', axis=1)



### Question 3
Suppose you receive a daily dataset of 100M trade records (CSV files in cloud storage). You need to:

- Clean and aggregate them with PySpark (e.g., compute per-ticker daily PnL), and

- Expose the aggregated data via a FastAPI service for downstream dashboards.*

How would you design and implement this pipeline? What considerations would you keep in mind?