Skip to content

Data wrangling and Feature analysis based on the Netflix userbase sample Dataset.

License

Notifications You must be signed in to change notification settings

shahriar-rahman/Exploratory-Analysis-of-Netflix-Userbase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

===========================================================================

Exploratory Analysis of Netflix User Base Data

By leveraging the power of data analysis and engineering tools such as Matplotlib, Pandas, MisingNo, and Seaborn, an in-depth and visual exploration is conducted in order to discover key insights about age demographics, age, and gender distribution, subscription types, and so forth. This notebook might serve as a hands-on experience for beginners in the field of data science.

Netflix1.gif



◘ Introduction

The acquired dataset provides a sample Netflix user base, showcasing a plethora of monthly revenue, user subscriptions, activity, and account details. Each sample represents a unique user, identified by their identification as a user ID, and includes information such as the subscription type which is categorized as Basic, Standard, or Premium. The revenue generated monthly from their subscription is also included along with the date of joining Netflix labeled as “Join Date”, the date of their last payment as “Last Payment Date”, and the country in which they resided.

Additional columns have been included to provide insights into user behavior and preferences, which include the type of devices, case in point, Smart TV, Mobile phone, Desktop, and Tablet. Moreover, the total watch time (in minutes), and account status including whether the account is active or not is also provided. It can be used to analyze and model user trends, preferences, and revenue generation within a hypothetical Netflix user base.



◘ Objective

The primary incentive of this research is to:

  • Process dataset by analyzing its integrity, missing values, duplicated values, and so forth.
  • Perform various clean-ups, if required, and improve accessibility for more convenient exploratory analysis.
  • Conduct exploratory analysis using a myriad of graphing tools to reach a conclusion.
  • To reach a proper decision on which model to apply to the processed dataset in a future project to achieve the ideal optimization tuning and hopefully, a better outcome in the model's generalization.



alt text



◘ Approach

This research is classified into 2 steps:

  1. Data Wrangling: Where the dataset is extracted, tested, cleaned, processed, and stored in memory.
  2. Feature Analysis: Where the processed data is then explored thoroughly to acquire a viable insight.



◘ Methodologies & Technologies applied

  • Diagnose and fix structural errors
  • Check and Clean data
  • Address duplicates & outliers
  • Logical feature amalgamation to construct a unique variable
  • Univariate inspection
  • Bivariate inspection
  • Feature correlations
  • Seaborn & Matplotplib visualizations



◘ Project Flowchart

alt text



◘ Required Modules

  • pandas 2.0.3
  • missingNo 0.5.2
  • matplotlib 3.7.0
  • seaborn 0.12.2

◘ Jupyter core packages

  • IPython : 8.10.0
  • ipykernel : 6.19.2
  • ipywidgets : 7.6.5
  • jupyter_client : 7.3.4
  • jupyter_core : 5.2.0
  • jupyter_server : 1.23.4
  • jupyterlab : 3.5.3



◘ Project Organization


├── LICENSE
│
├── README.md          <- The top-level README for developers using this project.
│
├── data
│   └── processed      <- The final, canonical data sets for modeling.
│   └── raw               <- The original, immutable data dump.
│
│
├── notebooks          <- Jupyter notebooks for EDA
│                         		
│
├── figures               <- Generated graphics and figures to be used in reporting using Jupyter Notebooks
|
│
├── img            <- Project related files
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
│
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io



◘ Installation (using pip)

In Jupyter, the console commands can be executed by the ‘!’ sign before the command within the cell. For example, If the following code is written in the Jupyter cell, it will execute as a command in CMD. To install any modules effectively, the sys python package is used and works as follows:

import sys
!{sys.executable} -m pip install [package_name]                               
  1. For Pandas, run:
!{sys.executable} -m pip install pandas                                                  
  1. To install missingNo:
!{sys.executable} -m pip install missingno                                                  
  1. Matplotlib can be installed by running the following command:
!{sys.executable} -m pip install matplotlib
  1. Lastly, for seaborn:
!{sys.executable} -m pip install seaborn



◘ Import Packages

To import the dependencies, simply open the preferred IDE or Notebook:

  1. For Pandas, run the following command:
import pandas as pd                                   
  1. To use missingno, run:
import missingno as msn                                      
  1. Import matplotlib using:
import matplotlib.pyplot as plt                                     
  1. Seaborn can be accessed by:
import seaborn as sns                                      



◘ Supplementary Resources



◘ License

This is free and unencumbered software released into the public domain. Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.



===========================================================================

About

Data wrangling and Feature analysis based on the Netflix userbase sample Dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages