# Pandas — Assessment

This assessment aligns with `Pandas/Pandas.ipynb` and covers Series/DataFrame basics, indexing, I/O, grouping, merging, time series, and vectorized operations.

Total questions: 25 (10 Theory, 8 Fill-in-the-Blanks, 7 Coding). Difficulty mix: 40% easy, 40% medium, 20% hard.


## Instructions
- Answer all questions.
- Implement functions and run the asserts.
- Prefer idiomatic Pandas over loops.
- Solutions are provided at the bottom.


## References
- `Pandas/Pandas.ipynb`


## Part A — Theory (10)
1. What is the difference between a Pandas Series and a DataFrame?
2. MCQ: Which indexer is label-based? (a) `iloc` (b) `iat` (c) `loc` (d) slicing `:`
3. Explain the difference between `copy()` and view when slicing a DataFrame.
4. What is the purpose of `axis` argument in operations like `sum`, `mean`?
5. MCQ: Which performs an inner join? (a) `merge(how='left')` (b) `merge(how='outer')` (c) `merge(how='inner')` (d) `concat(axis=1)`
6. When would you use `groupby().agg()` versus `pivot_table()`?
7. Explain chained assignment warning and how to avoid it.
8. What’s the difference between `apply`, `map`, and `applymap`?
9. MCQ: Which is best for reading a large CSV in chunks? (a) `read_csv(chunksize=...)` (b) `read_csv(low_memory=False)` (c) `read_table()` (d) `json_normalize()`
10. How does Pandas handle missing data? Name two functions helpful for NA handling.


## Part B — Fill in the Blanks (8)
1. Label-based selection uses the __________ indexer.
2. Position-based selection uses the __________ indexer.
3. To combine DataFrames row-wise we can use `pd.__________( [df1, df2], axis=0 )`.
4. To compute multiple aggregations per group we can pass a __________ to `agg`.
5. To parse a date column while reading CSV, use `read_csv(__________)`.
6. `df.isna()` returns a DataFrame of __________ values indicating missingness.
7. To set a column as the index, call `df.set_index('col', ________)` to avoid returning a copy.
8. A categorical column can reduce memory because it stores __________ and codes.


## Part C — Coding Tasks (7)
Implement the functions below using Pandas.

Tasks:
1. `select_top_n(df, col, n)` — return rows with largest `n` values in `col`.
2. `normalize_numeric(df)` — return a copy where numeric columns are standardized (z-score), non-numerics unchanged.
3. `group_mean(df, by, target)` — return a Series with mean of `target` grouped by `by` sorted descending.
4. `merge_customers_orders(customers, orders)` — inner join on `customer_id`.
5. `add_total_col(df, cols, name)` — add a new column `name` as row-wise sum across `cols`.
6. `month_over_month(df, date_col, value_col)` — return a DataFrame with columns `[date_col, value_col, 'mom']`, where `mom` is percentage change vs previous month (sorted by date).
7. `fill_missing_forward(df, col)` — forward-fill missing values of `col` in-place and return df.


In [None]:
import pandas as pd
import numpy as np

def select_top_n(df: pd.DataFrame, col: str, n: int) -> pd.DataFrame:
    return df.nlargest(n, col)

def normalize_numeric(df: pd.DataFrame) -> pd.DataFrame:
    out = df.copy()
    num = out.select_dtypes(include=[np.number])
    mu = num.mean()
    sd = num.std(ddof=0).replace(0, 1)
    out[num.columns] = (num - mu) / sd
    return out

def group_mean(df: pd.DataFrame, by: str, target: str) -> pd.Series:
    return df.groupby(by)[target].mean().sort_values(ascending=False)

def merge_customers_orders(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    return customers.merge(orders, on='customer_id', how='inner')

def add_total_col(df: pd.DataFrame, cols: list, name: str) -> pd.DataFrame:
    df = df.copy()
    df[name] = df[cols].sum(axis=1)
    return df

def month_over_month(df: pd.DataFrame, date_col: str, value_col: str) -> pd.DataFrame:
    d = df.copy()
    d[date_col] = pd.to_datetime(d[date_col])
    d = d.sort_values(date_col)
    d['mom'] = d[value_col].pct_change()
    return d[[date_col, value_col, 'mom']]

def fill_missing_forward(df: pd.DataFrame, col: str) -> pd.DataFrame:
    df[col] = df[col].ffill()
    return df


In [None]:
# Asserts
df = pd.DataFrame({'a':[5,1,3,9], 'b':[10,20,30,40]})
assert list(select_top_n(df,'a',2)['a']) == [9,5]

_norm = normalize_numeric(df)
assert np.allclose(_norm['a'].mean(), 0, atol=1e-7)

df2 = pd.DataFrame({'grp':['x','x','y'], 'val':[1,3,2]})
gm = group_mean(df2,'grp','val')
assert list(gm.index) == ['x','y']

cust = pd.DataFrame({'customer_id':[1,2], 'name':['A','B']})
ordr = pd.DataFrame({'order_id':[11,12], 'customer_id':[1,1]})
mo = merge_customers_orders(cust, ordr)
assert set(mo.columns) == {'customer_id','name','order_id'}

df3 = add_total_col(pd.DataFrame({'x':[1,2],'y':[3,4]}), ['x','y'], 't')
assert list(df3['t']) == [4,6]

df4 = pd.DataFrame({'date': ['2020-01-01','2020-02-01','2020-03-01'], 'v':[100,110,99]})
mom = month_over_month(df4,'date','v')
assert 'mom' in mom.columns and len(mom)==3

df5 = pd.DataFrame({'c':[np.nan, 1, np.nan, 2]})
fill_missing_forward(df5,'c')
assert list(df5['c']) == [np.nan,1.0,1.0,2.0]

print('Pandas asserts passed ✅')


## Solutions

### Theory (sample)
1. Series is 1D labeled array; DataFrame is 2D labeled table.
2. (c) `loc`
3. Slice often returns a view; `copy()` forces a new object.
4. Axis selects dimension: axis=0 operates down rows, axis=1 across columns.
5. (c) inner join.
6. `groupby().agg()` for grouped aggregations; `pivot_table()` for reshaping with aggregations.
7. Occurs when assigning to a view; avoid by using `.loc` with explicit copy.
8. `map` on Series, `apply` on Series/DataFrame along an axis, `applymap` elementwise on DataFrame.
9. (a) `chunksize`.
10. Use `isna`, `fillna`, `dropna`, etc.

### Fill blanks
1. `loc`
2. `iloc`
3. `concat`
4. dict
5. `parse_dates=...`
6. boolean
7. `inplace=True`
8. categories