# 🧹 Prepare Phase – Bellabeat Smart Device Usage Case Study

## 📂 Data Source

The dataset used in this analysis is the **[FitBit Fitness Tracker Data (CC0: Public Domain)](https://www.kaggle.com/datasets/arashnic/fitbit)** made available through Mobius on Kaggle. It contains data collected from **30 Fitbit users** who consented to share their personal fitness tracking data.

- **Original Author**: Mobius
- **License**: CC0: Public Domain
- **Access**: Publicly available on Kaggle
- **Format**: `.csv` files

### 📁 Files Used

The repository contains the following 11 CSV files inside the `/data/` directory:

- `dailyActivity_merged.csv`
- `heartrate_seconds_merged.csv`
- `hourlyCalories_merged.csv`
- `hourlyIntensities_merged.csv`
- `hourlySteps_merged.csv`
- `minuteCaloriesNarrow_merged.csv`
- `minuteIntensitiesNarrow_merged.csv`
- `minuteMETsNarrow_merged.csv`
- `minuteSleep_merged.csv`
- `minuteStepsNarrow_merged.csv`
- `weightLogInfo_merged.csv`

---

## 🗂️ Data Structure

- The data is primarily stored in **wide format**, with each row representing an observation for a specific user and time (daily, hourly, or minute).
- Each file captures a **different aspect of user behavior** such as steps, heart rate, calories, or sleep.

| File | Granularity | Purpose |
|------|-------------|---------|
| `dailyActivity_merged.csv` | Daily | Steps, distance, activity level |
| `heartrate_seconds_merged.csv` | Per second | Heart rate values |
| `minuteSleep_merged.csv` | Per minute | Sleep intervals |
| `weightLogInfo_merged.csv` | Per entry | Weight and BMI logs |

---

## ✅ ROCCC Assessment

The ROCCC acronym checks the credibility of the data source:

| Check | Assessment |
|-------|------------|
| **Reliable** | Sourced from a reputable platform (Kaggle), anonymized data |
| **Original** | Collected directly from Fitbit devices |
| **Comprehensive** | Includes multiple health dimensions (sleep, activity, heart rate) |
| **Current** | Dated 2016 – not recent, but still relevant for understanding general behavior |
| **Cited** | Dataset is well-referenced on Kaggle |

---

## 🔐 Licensing, Privacy & Security

- The data is **openly licensed** as **Public Domain (CC0)**.
- Personally identifiable information is **removed**, protecting user privacy.
- No additional privacy or security measures are required for this dataset.

---

## 🔎 Data Integrity Checks

- Data was loaded into Jupyter Notebooks using `pandas.read_csv()`.
- Initial checks:
  - `.head()` previewed to verify correct structure.
  - `.info()` and `.describe()` used to confirm consistency, data types, and null values.
- All files were successfully read and matched the expected schema.

---

## ⚠️ Potential Limitations

- **Sample Size**: Only 30 users – may not generalize to broader populations.
- **Date Range**: Limited to a 2-month period in 2016.
- **Demographic Info**: Missing age, gender, and location data.
- **Device Type**: All users wore Fitbits – Bellabeat users may behave differently.

---

### 🔁 Next Step

We will now move to the **Process phase**, where we’ll clean and prepare these datasets for analysis in Python.
