# Wellness Tech Company Analysis Report
> Author: Hannan Khan  
> Last Updated: 2022-02-25 07:09:30

## Table Of Contents:
* [Current Status](#Current-Status)
* [Business Task](#Business-Task)
* [Data Sources](#Data-Sources)
* [Data Cleaning](#Data-Cleaning)
    * [Loading Libraries](#Loading-Libraries)
    * [Creating Database Objects](#Creating-Database-Objects)
    * [Common Query Functions](#Common-Query-Functions)
* [Data Preparation-Processing](#Data-Preparation-Processing)
    * [Looking At All Tables](#Looking-At-All-Tables,-Their-Columns-&-Datatypes)
* [Analysis](#Analysis)
    * [Analysis Summary](#Analysis-Summary)
* [Actions](#Actions)
* [Appendix](#Appendix)
    * [Definitions](#Definitions)
    * [Files Used](#Files-Used)

## Current Status
The company `Bellabeat` creates health-focused smart technology products for women. The data gathered from these products includes activity, sleep, stress, and reproductive health data.  
  
The company currently invests in traditional advertising media. The co-founder believes that an analysis of smart technology usage data will allow for more opportunities of business growth.

## Business Task
The current goal of `Bellabeat` is to gain insight into how people use smart devices. These insights can then be used to idenitfy potential trends within `Bellabeat` customers' data. The insights will also be used to inform the company's marketing strategy.  
The **goal of this report is to analyze non-Bellabeat smart device usage data to gain insights about smart device usage trends.**

## Data Sources
The data is publicly available [here](https://www.kaggle.com/arashnic/fitbit). The data was collected during `03/2016 - 05/2016` through a anonymized survey program bywhich `30` respondents submitted their personal Fitbit data. The data is made available using the [Creative Commons License](https://creativecommons.org/publicdomain/zero/1.0/).  
The dataset contains various files, each of which can be characterized temporally by day, hour, or minute. The files include data on:

* Calories
* Intensities
* Steps
* Sleep
* and more...

The dataset in question is:

* *Reliable*: From a distributed anonymized survey.
* *Original*: This is novel data.
* *Comprehensive*: This data covers enough features for us to analyze the smart device usage of 30 users.
* _**NOT** Current_: Collected during `03/2016 - 05/2016`.
* *Cited*: see [Data Sources](#data-sources)

This dataset was analyzed using SQL, specifically, Python's sqlite3 library. The notebook for creating and loading a database with the dataset's files can be found [here](https://github.com/hannankhan888/Data_Science_Portfolio/tree/main/Wellness_Tech_Company_Analysis_Case_Study/Database_Creator_Loader.ipynb).

## Data Cleaning
The data was concatenated into `daily`, `minuteNarrow`, `minuteWide`, and `hourly` SQL tables using [this](https://github.com/hannankhan888/Data_Science_Portfolio/tree/main/Wellness_Tech_Company_Analysis_Case_Study/Data_Concatenation.ipynb) notebook.  
The data was already cleaned using the aforementioned notebook. So, we will instead use this section of the report to initialize all variables and functions.  

### Loading Libraries

In [6]:
import sqlite3 as sql
from pprint import pprint
import os

### Creating Database Objects

In [3]:
data_dir = r"D:\Datasets\Fitabase_Data"
db_dir = data_dir + r"\\database.db"

# create a connection and cursor to the database:
db_con = sql.connect(db_dir)
cur = db_con.cursor()

# test if connection works:
cur.execute("SELECT name FROM sqlite_master WHERE type='table';")
print("Tables:")
pprint(cur.fetchall())

Tables:
[('daily',), ('minuteNarrow',), ('minuteWide',), ('hourly',)]


### Common Query Functions

In [4]:
def get_tables_and_cols(cur):
    """ Gets the tables and columns from a cursor object.
    RETURNS tables_cols: a list of tuples with table name and column name.
            tables_dict: a dict where every key is a table name and every
                        value is a list of column names."""
    get_cols = f"""
    SELECT
      m.name,
      p.name
    FROM 
      sqlite_master AS m
    JOIN 
      pragma_table_info(m.name) AS p
    WHERE m.type='table'
    """
    cur.execute(get_cols)
    tables_cols = cur.fetchall()

    # create tables dictionary:
    tables_set = set()
    tables_dict = {}
    for table,_ in tables_cols:
        tables_set.add(table)
    for table in tables_set:
        tables_dict[table] = []
        for tbl,col in tables_cols:
            if table==tbl:
                tables_dict[table].append(col)
    return tables_dict, tables_cols

def get_database_table_count(cur):
    """ Prints the number of tables from a cursor object."""
    
    cur.execute("SELECT COUNT(*) FROM sqlite_master WHERE type = 'table';")
    print("How many tables do we have:", cur.fetchall())

def get_all_pragma_tables(cur):
    """ Gets the pragma (schema) tables for all tables from the cursor object.
    Prints result as a table."""
    
    all_pragma_tables_query = """
    SELECT 
      m.name as table_name,
      p.*
    FROM 
      sqlite_master AS m
    JOIN 
      pragma_table_info(m.name) AS p
    ORDER BY 
      m.name, 
      p.cid
    """

    cur.execute(all_pragma_tables_query)
    print("|:::::::::::::::::::::::::::::::::::::ALL PRAGMA TABLES::::::::::::::::::::::::::::::::::::::::|")
    print("|{:30}|{:3}|{:25}|{:10}|{:7}|{:11}|{:2}|".format("table_name", "cid", "col_name", "type", "notnull", "dflt_values", "pk"))
    print("|",":"*92,"|")
    for blah in cur.fetchall():
        blah = [str(i) for i in blah]
        print("|{:30}|{:3}|{:25}|{:10}|{:7}|{:11}|{:2}|".format(blah[0], blah[1], blah[2], blah[3], blah[4], blah[5], blah[6]))

def get_table_col_example(cur):
    """ Gets the table name, column name, and an example data point from
    that column.
    Prints results in neat table."""
    
    _, tables_cols = get_tables_and_cols(cur)

    print("|{:30}|{:25}|{:25}|".format("TABLE", "COLUMN", "EXAMPLE"))
    print("|","="*80,"|")
    for table,col in tables_cols:
        get_example = f"""
        SELECT {col}
        FROM {table}
        LIMIT 1
        """
        cur.execute(get_example)
        print("|{:30}|{:25}|{:25}|".format(table, col, str(cur.fetchall()[0][0])))

def get_tables_with_num_rows(cur):
    """ Prints table names along with the number of rows in that table."""
    
    tables_dict, tables_cols = get_tables_and_cols(cur)
    print("Number of rows in each table:")
    for table in sorted(tables_dict.keys()):
        cur.execute(f"SELECT COUNT(*) FROM {table}")
        print("{:32}{:>15}".format(table, str(cur.fetchall())))

def get_table_cols_with_dtypes(cur, table_name):
    """Prints the table column names, along with their associated data types."""
    
    get_new_table_cols = f"""
    WITH minuteTables AS (
    SELECT m.name
    FROM 
      sqlite_master AS m
    JOIN 
      pragma_table_info(m.name) AS p
    WHERE
      m.name = '{table_name}'
    GROUP BY
      m.name
    )
    SELECT col AS columns, type AS dtype
    FROM (
        SELECT pti.name AS col, pti.type AS type
        FROM minuteTables AS t CROSS JOIN pragma_table_info(t.name) AS pti
        GROUP BY col
    );"""

    cur.execute(get_new_table_cols)
    print(f"`{table_name}` table cols:")
    pprint(cur.fetchall())

## Data Preparation-Processing
The data was prepared and preprocessed using the [data concatenation notebook](https://github.com/hannankhan888/Data_Science_Portfolio/tree/main/Wellness_Tech_Company_Analysis_Case_Study/Data_Concatenation.ipynb). The only feature engineering completed was that of adding a new "TotalActiveMinutes" column to the `daily` table in the database.  
We will instead use this section to get an overall view of the database and its tables.

### Looking At All Tables, Their Columns & Datatypes

In [7]:
get_all_pragma_tables(cur)

|:::::::::::::::::::::::::::::::::::::ALL PRAGMA TABLES::::::::::::::::::::::::::::::::::::::::|
|table_name                    |cid|col_name                 |type      |notnull|dflt_values|pk|
| :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: |
|daily                         |0  |Id                       |INT       |0      |None       |0 |
|daily                         |1  |ActivityDate             |NUM       |0      |None       |0 |
|daily                         |2  |TotalSteps               |INT       |0      |None       |0 |
|daily                         |3  |TotalDistance            |NUM       |0      |None       |0 |
|daily                         |4  |TrackerDistance          |NUM       |0      |None       |0 |
|daily                         |5  |LoggedActivitiesDistance |NUM       |0      |None       |0 |
|daily                         |6  |VeryActiveDistance       |NUM       |0      |None       |0 |
|daily                        

## Analysis

### Analysis Summary

## Actions

## Appendix

### Definitions

### Files Used
| Description | File |
| :---------- | :--: |
| Jupyter notebook used for data concatenation, cleaning, and some feature engineering. | [Data_Concatenation.ipynb](https://github.com/hannankhan888/Data_Science_Portfolio/tree/main/Wellness_Tech_Company_Analysis_Case_Study/Data_Concatenation.ipynb) |
| Folder containing large SQL queries used in this project. | [/SQL_queries/](https://github.com/hannankhan888/Data_Science_Portfolio/tree/main/Wellness_Tech_Company_Analysis_Case_Study/SQL_queries) |
| This report in Jupyter notebook format. | [Wellness_Tech_Company_Analysis_Report.ipynb](https://github.com/hannankhan888/Data_Science_Portfolio/tree/main/Wellness_Tech_Company_Analysis_Case_Study/Wellness_Tech_Company_Analysis_Report.ipynb) |