# Data Analysis Basics

*Load → Inspect → Analyse → Model → Visualise*

### Learning goals
1. Load experimental data from a CSV file.
2. Compute descriptive statistics (mean, standard deviation, etc.).
3. Perform a simple linear fit to the data.
4. Plot raw data and fitted model using **matplotlib**.
5. Interpret residuals and goodness‑of‑fit.

## 1 · Import libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## 2 · Load experimental data

In [None]:
df = pd.read_csv('sample_experiment.csv')  # ensure the CSV is in the same folder
df.head()

### Quick inspection

In [None]:
df.info()
df.describe()

## 3 · Basic statistics

In [None]:
# Using pandas built‑ins
mean_v = df['voltage_V'].mean()
std_v = df['voltage_V'].std(ddof=0)  # population std
print(f'mean = {mean_v:.3f} V,  σ = {std_v:.3f} V')

# Same with NumPy
mean_np = np.mean(df['voltage_V'].values)
std_np = np.std(df['voltage_V'].values)
print(f'(NumPy) mean = {mean_np:.3f} V,  σ = {std_np:.3f} V')

### Custom helper (optional)

In [None]:
def rms(arr):
    """Root‑mean‑square of a 1‑D array"""
    return np.sqrt(np.mean(np.square(arr)))

print('RMS voltage:', rms(df['voltage_V']).round(3), 'V')

## 4 · Visualising the data

In [None]:
fig, ax = plt.subplots()
ax.scatter(df['time_s'], df['voltage_V'], label='measurement', alpha=0.8)
ax.set_xlabel('Time [s]')
ax.set_ylabel('Voltage [V]')
ax.set_title('Experimental data')
ax.legend(); plt.show()

## 5 · Linear fit

In [None]:
coeffs = np.polyfit(df['time_s'], df['voltage_V'], deg=1)  # slope, intercept
slope, intercept = coeffs
print(f'slope = {slope:.3f} V/s,  intercept = {intercept:.3f} V')

fit_line = np.poly1d(coeffs)

fig, ax = plt.subplots()
ax.scatter(df['time_s'], df['voltage_V'], alpha=0.8, label='measurement')
ax.plot(df['time_s'], fit_line(df['time_s']), color='red', label='linear fit')
ax.set_xlabel('Time [s]'); ax.set_ylabel('Voltage [V]')
ax.legend(); ax.set_title('Linear fit to data'); plt.show()

### Residual analysis

In [None]:
residuals = df['voltage_V'] - fit_line(df['time_s'])
rmse = np.sqrt(np.mean(residuals**2))
print(f'RMSE = {rmse:.3f} V')

fig, ax = plt.subplots()
ax.hist(residuals, bins=10)
ax.set_xlabel('Residual [V]'); ax.set_ylabel('Count')
ax.set_title('Residual distribution'); plt.show()


## 6 · Your turn ✍️

1. **Median & MAD** – compute the median voltage and median absolute deviation (MAD).  
2. **Alternative fit** – try fitting a quadratic (`deg=2`) and compare RMSE.  
3. **Export** – save your cleaned DataFrame to `clean_experiment.csv`.


## 7 · Summary
You loaded a dataset, explored descriptive statistics, performed a linear fit, and visualised both the raw data and model. These are the essential steps of a basic data‑analysis workflow—ready for your own experimental measurements!