## Streaming Service User Analysis

### import libraries

In [3]:
import pandas as pd
import numpy as np

### Reading csv file

In [5]:
stream_Data=pd.read_csv("streaming_service_data.csv")
stream_Data.head()

Unnamed: 0,User_ID,User_Name,Join_Date,Last_Login,Monthly_Price,Watch_Hours,Favorite_Genre,Active_Devices,Profile_Count,Parental_Controls,...,Payment_Method,Language_Preference,Recommended_Content_Count,Average_Rating_Given,Has_Downloaded_Content,Membership_Status,Loyalty_Points,First_Device_Used,Age_Group,Primary_Watch_Time
0,2518,Amber,5/15/2023,12/13/2024,7.99,49,Action,3,6,True,...,PayPal,Hindi,84,3.3,False,Active,2878,Smartphone,35-44,Late Night
1,6430,Patrick,4/3/2023,12/15/2024,7.99,161,Drama,1,2,True,...,PayPal,German,69,4.0,False,Active,2291,Desktop,25-34,Evening
2,1798,Robert,8/2/2023,12/14/2024,11.99,87,Action,2,5,False,...,Debit Card,Mandarin,56,3.1,False,Active,1692,Desktop,35-44,Late Night
3,5255,Cole,1/31/2023,12/2/2024,15.99,321,Sci-Fi,1,5,False,...,Debit Card,Mandarin,47,4.6,False,Active,952,Desktop,25-34,Evening
4,2854,Jamie,6/6/2023,12/15/2024,11.99,386,Documentary,1,4,True,...,PayPal,Mandarin,39,3.7,False,Active,1823,Desktop,25-34,Late Night


In [6]:
stream_Data.shape

(1000, 23)

In [7]:
stream_Data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 23 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   User_ID                    1000 non-null   int64  
 1   User_Name                  1000 non-null   object 
 2   Join_Date                  1000 non-null   object 
 3   Last_Login                 1000 non-null   object 
 4   Monthly_Price              1000 non-null   float64
 5   Watch_Hours                1000 non-null   int64  
 6   Favorite_Genre             1000 non-null   object 
 7   Active_Devices             1000 non-null   int64  
 8   Profile_Count              1000 non-null   int64  
 9   Parental_Controls          1000 non-null   bool   
 10  Total_Movies_Watched       1000 non-null   int64  
 11  Total_Series_Watched       1000 non-null   int64  
 12  Country                    1000 non-null   object 
 13  Payment_Method             1000 non-null   object

### Data cleaning

#### understanding data set

In [10]:
stream_Data.columns

Index(['User_ID', 'User_Name', 'Join_Date', 'Last_Login', 'Monthly_Price',
       'Watch_Hours', 'Favorite_Genre', 'Active_Devices', 'Profile_Count',
       'Parental_Controls', 'Total_Movies_Watched', 'Total_Series_Watched',
       'Country', 'Payment_Method', 'Language_Preference',
       'Recommended_Content_Count', 'Average_Rating_Given',
       'Has_Downloaded_Content', 'Membership_Status', 'Loyalty_Points',
       'First_Device_Used', 'Age_Group', 'Primary_Watch_Time'],
      dtype='object')

## 📄 Dataset Description

This dataset contains user information from a streaming service. Below is a description of each column:

| Column Name | Description |
|-------------|-------------|
| **User_ID** | Unique identifier for each user. |
| **User_Name** | The name or username of the user. |
| **Join_Date** | The date when the user created their account. |
| **Last_Login** | The most recent date the user logged into the platform. |
| **Monthly_Price** | The monthly subscription price paid by the user. |
| **Watch_Hours** | Total number of hours the user has spent watching content. |
| **Favorite_Genre** | The genre most frequently watched by the user such as Action, Comedy, Documentary, Drama, Horror, Romance,Sci-Fi. |
| **Active_Devices** | Number of devices currently linked to the user account. |
| **Profile_Count** | Number of user profiles under one account. |
| **Parental_Controls** | Whether parental control settings are enabled such as False,  True. |
| **Total_Movies_Watched** | Total number of movies watched by the user. |
| **Total_Series_Watched** | Total number of series watched by the user. |
| **Country** | The country of residence of the user are Australia, Canada, France, Germany, India,                UK, USA. |
| **Payment_Method** | The payment method used (e.g., Credit Card, Cryptocurrency, Debit Card,                              PayPal). |
| **Language_Preference** | The preferred language of content/interface i have is English, French,                                German, Hindi, Mandarin, Spanish. |
| **Recommended_Content_Count** | Number of recommended contents the user has received. |
| **Average_Rating_Given** | Average rating the user has given to content. |
| **Has_Downloaded_Content** | Whether the user has downloaded content false and true. |
| **Membership_Status** | Indicates if the user has an active or inactive membership. |
| **Loyalty_Points** | Loyalty points accumulated by the user. |
| **First_Device_Used** | The first device used to access the platform used are Desktop, Laptop, Smart TV, Smartphone, Tablet. |
| **Age_Group** | The age group the user falls into (e.g., Teen, Adult, Senior). |
| **Primary_Watch_Time** | The time of day the user most frequently watches content used was                                   Afternoon', Evening, Late Night, Morning. |


In [12]:
np.unique(stream_Data.Monthly_Price.to_list())

array([ 7.99, 11.99, 15.99])

In [13]:
np.unique(stream_Data.Watch_Hours.to_list())

array([ 10,  11,  12,  13,  14,  15,  16,  17,  18,  19,  20,  21,  22,
        23,  24,  25,  26,  27,  28,  29,  30,  32,  33,  34,  35,  36,
        37,  38,  39,  40,  42,  43,  44,  45,  46,  47,  48,  49,  50,
        52,  53,  54,  55,  56,  57,  58,  59,  61,  62,  63,  64,  65,
        68,  69,  70,  71,  73,  74,  75,  76,  77,  78,  81,  82,  83,
        84,  85,  87,  88,  89,  90,  91,  92,  93,  96,  97,  98,  99,
       100, 101, 102, 103, 104, 105, 106, 108, 109, 110, 112, 113, 114,
       115, 116, 118, 119, 120, 121, 122, 123, 124, 125, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 144,
       145, 146, 147, 148, 150, 152, 153, 154, 155, 157, 158, 159, 160,
       161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173,
       174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186,
       187, 188, 190, 191, 192, 193, 195, 196, 197, 198, 200, 201, 202,
       203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 214, 21

In [14]:
np.unique(stream_Data.Favorite_Genre.to_list())

array(['Action', 'Comedy', 'Documentary', 'Drama', 'Horror', 'Romance',
       'Sci-Fi'], dtype='<U11')

In [15]:
np.unique(stream_Data.Active_Devices.to_list())

array([1, 2, 3, 4, 5])

In [16]:
np.unique(stream_Data.Profile_Count.to_list())

array([1, 2, 3, 4, 5, 6])

In [17]:
np.unique(stream_Data.Parental_Controls.to_list())

array([False,  True])

In [18]:
np.unique(stream_Data.Total_Movies_Watched.to_list())

array([  12,   13,   15,   17,   18,   20,   22,   25,   26,   27,   28,
         29,   30,   31,   33,   38,   39,   40,   42,   44,   49,   50,
         51,   52,   53,   56,   57,   59,   60,   61,   62,   63,   65,
         67,   69,   70,   73,   75,   76,   77,   79,   80,   84,   85,
         89,   90,   91,   92,   93,   95,   96,   97,  102,  103,  104,
        105,  107,  108,  109,  112,  113,  115,  118,  120,  123,  127,
        131,  132,  133,  135,  138,  139,  141,  142,  144,  146,  148,
        150,  151,  154,  155,  156,  158,  159,  160,  161,  162,  163,
        168,  170,  174,  175,  177,  178,  179,  180,  181,  182,  183,
        184,  185,  186,  187,  188,  190,  191,  192,  194,  196,  198,
        201,  202,  203,  205,  206,  207,  208,  209,  213,  214,  216,
        217,  218,  221,  222,  225,  226,  228,  229,  230,  231,  232,
        233,  235,  236,  237,  238,  239,  240,  242,  244,  245,  246,
        247,  253,  255,  256,  257,  260,  263,  2

In [19]:
np.unique(stream_Data.Total_Series_Watched.to_list())

array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
        14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
        27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
        40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
        53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
        66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
        79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
        92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103, 104,
       105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117,
       118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130,
       131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,
       144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156,
       157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 168, 169, 170,
       171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 18

In [20]:
np.unique(stream_Data.Country.to_list())

array(['Australia', 'Canada', 'France', 'Germany', 'India', 'UK', 'USA'],
      dtype='<U9')

In [21]:
np.unique(stream_Data.Payment_Method.to_list())

array(['Credit Card', 'Cryptocurrency', 'Debit Card', 'PayPal'],
      dtype='<U14')

In [22]:
np.unique(stream_Data.Language_Preference.to_list())

array(['English', 'French', 'German', 'Hindi', 'Mandarin', 'Spanish'],
      dtype='<U8')

In [23]:
np.unique(stream_Data.Recommended_Content_Count.to_list())

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100])

In [24]:
np.unique(stream_Data.Average_Rating_Given.to_list())

array([3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2,
       4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. ])

In [25]:
np.unique(stream_Data.Has_Downloaded_Content.to_list())

array([False,  True])

In [26]:
np.unique(stream_Data.Average_Rating_Given.to_list())

array([3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2,
       4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. ])

In [27]:
np.unique(stream_Data.Membership_Status.to_list())

array(['Active'], dtype='<U6')

In [28]:
np.unique(stream_Data.Loyalty_Points.to_list())

array([   3,    4,   15,   17,   28,   30,   33,   47,   48,   53,   55,
         60,   62,   68,   72,   73,   74,   77,   92,   96,   97,   99,
        105,  106,  110,  111,  119,  125,  130,  136,  144,  147,  150,
        164,  165,  167,  168,  172,  173,  185,  188,  212,  213,  214,
        215,  218,  223,  225,  228,  234,  237,  239,  242,  244,  245,
        249,  255,  256,  260,  274,  290,  292,  296,  314,  318,  333,
        336,  340,  342,  344,  353,  354,  368,  371,  373,  380,  388,
        396,  398,  402,  412,  414,  416,  421,  423,  424,  428,  433,
        444,  447,  448,  450,  459,  460,  473,  476,  484,  486,  494,
        496,  510,  513,  525,  527,  540,  544,  546,  547,  548,  561,
        564,  571,  581,  583,  585,  595,  599,  608,  613,  615,  628,
        631,  633,  639,  647,  650,  658,  668,  670,  674,  681,  708,
        710,  711,  718,  727,  728,  732,  745,  746,  747,  749,  754,
        755,  756,  771,  772,  773,  775,  782,  7

In [29]:
bins = [0, 50, 100, 200, 300, 500]
labels = ['0–50 ', '51–100 ', '101–200 ', '201–300 ', '301–500 ']
# Create a new column for hour groups
stream_Data['Watch_Hour_Group'] = pd.cut(stream_Data['Watch_Hours'], bins=bins, labels=labels, include_lowest=True)

# Count users in each group
hrs_counts = stream_Data['Watch_Hour_Group'].value_counts().sort_index()

# Display the counts
print(hrs_counts)

Watch_Hour_Group
0–50         93
51–100       88
101–200     210
201–300     201
301–500     408
Name: count, dtype: int64


In [30]:
loyalty_bins = [0, 1000, 2000, 3000, 4000, 4990]
loyalty_labels = ['0–999 ', '1000–1999 ', '2000–2999 ', '3000–3999 ', '4000–4990 ']

# Group Loyalty Points
stream_Data['Loyalty_Point_Group'] = pd.cut(
    stream_Data['Loyalty_Points'],
    bins=loyalty_bins,
    labels=loyalty_labels,
    include_lowest=True
)

# Count users per group
loyalty_counts = stream_Data['Loyalty_Point_Group'].value_counts().sort_index()
loyalty_counts

Loyalty_Point_Group
0–999         207
1000–1999     215
2000–2999     196
3000–3999     191
4000–4990     191
Name: count, dtype: int64

In [31]:
stream_Data.columns

Index(['User_ID', 'User_Name', 'Join_Date', 'Last_Login', 'Monthly_Price',
       'Watch_Hours', 'Favorite_Genre', 'Active_Devices', 'Profile_Count',
       'Parental_Controls', 'Total_Movies_Watched', 'Total_Series_Watched',
       'Country', 'Payment_Method', 'Language_Preference',
       'Recommended_Content_Count', 'Average_Rating_Given',
       'Has_Downloaded_Content', 'Membership_Status', 'Loyalty_Points',
       'First_Device_Used', 'Age_Group', 'Primary_Watch_Time',
       'Watch_Hour_Group', 'Loyalty_Point_Group'],
      dtype='object')

In [32]:
age_map = {
    '18-24': 'Young Streamers',
    '25-34': 'Adult Streamers',
    '35-44': 'Mature Streamers',
    '45-54': 'Older Streamers',
    '55+': 'Senior Streamers'
}

stream_Data['Age_Category'] = stream_Data['Age_Group'].map(age_map)
stream_Data['Age_Category'].unique()

array(['Mature Streamers', 'Adult Streamers', 'Young Streamers',
       'Older Streamers', 'Senior Streamers'], dtype=object)

In [33]:
stream_Data

Unnamed: 0,User_ID,User_Name,Join_Date,Last_Login,Monthly_Price,Watch_Hours,Favorite_Genre,Active_Devices,Profile_Count,Parental_Controls,...,Average_Rating_Given,Has_Downloaded_Content,Membership_Status,Loyalty_Points,First_Device_Used,Age_Group,Primary_Watch_Time,Watch_Hour_Group,Loyalty_Point_Group,Age_Category
0,2518,Amber,5/15/2023,12/13/2024,7.99,49,Action,3,6,True,...,3.3,False,Active,2878,Smartphone,35-44,Late Night,0–50,2000–2999,Mature Streamers
1,6430,Patrick,4/3/2023,12/15/2024,7.99,161,Drama,1,2,True,...,4.0,False,Active,2291,Desktop,25-34,Evening,101–200,2000–2999,Adult Streamers
2,1798,Robert,8/2/2023,12/14/2024,11.99,87,Action,2,5,False,...,3.1,False,Active,1692,Desktop,35-44,Late Night,51–100,1000–1999,Mature Streamers
3,5255,Cole,1/31/2023,12/2/2024,15.99,321,Sci-Fi,1,5,False,...,4.6,False,Active,952,Desktop,25-34,Evening,301–500,0–999,Adult Streamers
4,2854,Jamie,6/6/2023,12/15/2024,11.99,386,Documentary,1,4,True,...,3.7,False,Active,1823,Desktop,25-34,Late Night,301–500,1000–1999,Adult Streamers
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,6878,Mary,4/26/2024,11/27/2024,7.99,136,Sci-Fi,5,5,True,...,4.4,True,Active,4742,Tablet,35-44,Late Night,101–200,4000–4990,Mature Streamers
996,5681,Ryan,5/8/2024,12/8/2024,11.99,159,Romance,4,6,True,...,3.3,False,Active,2910,Desktop,18-24,Morning,101–200,2000–2999,Young Streamers
997,4448,David,3/23/2024,12/16/2024,11.99,99,Sci-Fi,4,2,False,...,3.6,False,Active,1180,Tablet,35-44,Morning,51–100,1000–1999,Mature Streamers
998,5795,John,11/25/2023,12/13/2024,11.99,157,Action,4,2,False,...,4.4,False,Active,1965,Laptop,25-34,Morning,101–200,1000–1999,Adult Streamers


In [70]:
stream_Data['Watch_Hours'] = pd.to_numeric(stream_Data['Watch_Hours'], errors='coerce')
stream_Data['Loyalty_Points'] = pd.to_numeric(stream_Data['Loyalty_Points'], errors='coerce')


In [87]:
stream_Data.to_csv("cleaned_stream_data.csv", index=False, encoding='utf-8')
