`Visualizing Segments`

For Task 3, the goal is to develop a report that analyzes user segmentation based on demographics (e.g., age group) or behavior (e.g., returning vs. new visitors), and then suggest ways to tailor content for these segments. Here's how to approach this task:

### 1. **User Segmentation: Demographics and Behavior**

You already have the following demographic and behavioral information from your dataset:
- **Age Group**: This is important for understanding what types of content are favored by different age segments.
- **User Type (New vs. Returning)**: This helps identify how loyal or engaged users are and which content encourages them to return.

You can also extend this to:
- **Location**: Segment users based on their geographical location to understand regional preferences.
- **Device Type**: Segment users based on the device (Mobile, Desktop, Tablet) they are using to consume the content. This can provide insights into platform preferences and help optimize the user experience.

#### Example of User Segmentation:

| **Segment**         | **Metric**               | **Insights** |
|---------------------|--------------------------|--------------|
| Age Group (13-18)    | Avg Time Spent: 5 min     | Younger users spend less time, might prefer short-form content or visuals. |
| Age Group (19-29)    | Avg Time Spent: 12 min    | Older users spend more time, could be engaged with story-driven content. |
| New Visitors         | Bounce Rate: 60%         | High bounce rate, possibly due to unclear navigation or unengaging initial content. |
| Returning Visitors   | Bounce Rate: 30%         | Lower bounce rate, indicating better engagement. Might enjoy serialized content. |
| Mobile Users         | Page Views: 10,000       | Optimize for mobile-friendly designs and fast loading times. |
| Desktop Users        | Page Views: 5,000        | Can afford more detailed, high-resolution visuals or complex interactions. |

### 2. **Visualizing Segments**

To understand user behavior better, create visualizations for the different segments. Here are a few ideas:

#### Pie Chart: User Distribution by Age Group
```python
# Pie chart for Age Group distribution
fig_age = px.pie(df_combined, values='UserCount', names='AgeGroup', title='User Distribution by Age Group')
fig_age.show()
```

#### Bar Chart: Returning vs. New Users
```python
# Bar chart for New vs Returning users
user_type_counts = df_combined['UserType'].value_counts().reset_index()
user_type_counts.columns = ['UserType', 'UserCount']

fig_user_type = px.bar(user_type_counts, x='UserType', y='UserCount', title='New vs Returning Users')
fig_user_type.show()
```

#### Heatmap: Device Type by Age Group
```python
# Heatmap for Device Type usage by Age Group
device_age_group = df_combined.groupby(['DeviceType', 'AgeGroup']).size().reset_index(name='UserCount')
fig_device_age = px.density_heatmap(device_age_group, x='DeviceType', y='AgeGroup', z='UserCount', 
                                    title='Device Type Usage by Age Group')
fig_device_age.show()
```

### 3. **Tailoring Content for Segments**

After analyzing the data, you can suggest content changes or strategies based on the insights from each segment:

#### 1. **Age Group (13-18)**
   - **Content Strategy**: Focus on visually appealing, short-form content like webtoon highlights, teasers, or character profiles.
   - **UI/UX**: Ensure a mobile-first experience with fast loading times, simple navigation, and engaging interactive elements (e.g., polls or quizzes).

#### 2. **Age Group (19-29)**
   - **Content Strategy**: Create deep, serialized content that dives into story arcs and character development. Introduce community-driven content, like discussions or fan theories.
   - **UI/UX**: Enable more customization options like saving favorite chapters or receiving recommendations based on previous reads.

#### 3. **New Visitors**
   - **Content Strategy**: Display featured or trending content to capture attention immediately. Use hooks in the first chapters to reduce bounce rates.
   - **UI/UX**: Optimize for user onboarding—introduce a clear call to action (CTA) like "Sign up to continue reading" or "Explore more."

#### 4. **Returning Visitors**
   - **Content Strategy**: Offer serialized content or exclusive early releases to keep them engaged. Personalized recommendations based on previously read content can increase loyalty.
   - **UI/UX**: Make it easy for them to pick up where they left off, possibly with notifications for new chapter releases.

#### 5. **Mobile Users**
   - **Content Strategy**: Prioritize mobile-friendly content, such as vertical scrolling webtoons. Mobile-exclusive content or early access for mobile users could increase engagement.
   - **UI/UX**: Ensure a seamless, lag-free experience on mobile devices. Implement touch-friendly navigation and fast-loading images.

### 4. **Report Structure**

Your final report could follow this structure:

1. **Introduction**
   - Brief overview of the website and content being analyzed.
   - Purpose of the segmentation and report.

2. **User Segmentation**
   - Overview of segments (e.g., age group, user type, device type).
   - Key metrics and insights for each segment.

3. **Visualizations**
   - Include pie charts, bar graphs, and heatmaps to show user distribution and behavior.

4. **Content Recommendations**
   - Suggest ways to tailor content for different segments.
   - Recommendations for UI/UX optimizations based on segment behavior.

5. **Conclusion**
   - Summarize key findings and actionable insights.

By following this approach, you will be able to create a detailed report that provides both analytical insights and strategic recommendations tailored to different user segments.

In [3]:
import pandas as pd
import plotly.express as px


df_combined = pd.read_csv("D:/Internship Assignment/Task3/combined_data.csv")
# Pie chart for Age Group distribution
# fig_age = px.pie(df_combined, values='UserCount', names='AgeGroup', title='User Distribution by Age Group')
# fig_age.show()
# Count the number of users by AgeGroup
user_counts_by_age = df_combined.groupby('AgeGroup')['UserID'].nunique().reset_index()

# Rename the column for clarity
user_counts_by_age.columns = ['AgeGroup', 'UserCount']

print(user_counts_by_age)



  AgeGroup  UserCount
0    13-17        478
1    18-24        608
2    25-34        739
3    35-44        730
4      45+        547


In [10]:
# df_combined.drop(columns=['Unnamed: 0'], inplace=True)

# if 'UserType_x' in df_combined.columns:
#           # Rename 'AgeGroup_x' to 'AgeGroup' and drop 'AgeGroup_y'
#           df_combined['UserType'] = df_combined['UserType_x']  # or use 'AgeGroup_y'
#           df_combined = df_combined.drop(columns=['UserType_x', 'UserType_y'], errors='ignore')

df_combined.to_csv("D:/Internship Assignment/Task3/combined_data.csv",index=False)

In [11]:


# Count the number of users by UserType
user_counts_by_type = df_combined.groupby('UserType')['UserID'].nunique().reset_index()

# Rename the column for clarity
user_counts_by_type.columns = ['UserType', 'UserCount']

print(user_counts_by_type)


    UserType  UserCount
0        New        928
1  Returning        903


In [12]:
# Count users by both AgeGroup and UserType
user_counts_multi = df_combined.groupby(['AgeGroup', 'UserType'])['UserID'].nunique().reset_index()

# Rename the column for clarity
user_counts_multi.columns = ['AgeGroup', 'UserType', 'UserCount']

print(user_counts_multi)


  AgeGroup   UserType  UserCount
0    13-17        New        291
1    13-17  Returning        261
2    18-24        New        389
3    18-24  Returning        369
4    25-34        New        517
5    25-34  Returning        465
6    35-44        New        475
7    35-44  Returning        480
8      45+        New        323
9      45+  Returning        305


In [13]:
# Total number of unique users
total_user_count = df_combined['UserID'].nunique()

# Add a new column with the total user count across all rows
df_combined['UserCount'] = total_user_count

print(df_combined.head())


   UserID SessionID             DateTime WebtoonID  PageViews  AvgTimeSpent  \
0     277     S1474  2024-06-20 05:16:29      W221         19          9.17   
1     277     S1474  2024-06-20 05:16:29      W221         19          9.17   
2     277     S1474  2024-06-20 05:16:29      W221         19          9.17   
3     277     S1474  2024-06-20 05:16:29      W221         19          9.17   
4     277     S1474  2024-06-20 05:16:29      W221         19          9.17   

   BounceRate DeviceType Location  Age TestGroup AgeGroup   UserType  \
0           1     Tablet       UK   21         B    18-24  Returning   
1           1     Tablet       UK   21         B    18-24  Returning   
2           1     Tablet       UK   21         B    18-24  Returning   
3           1     Tablet       UK   21         B    18-24  Returning   
4           1     Tablet       UK   21         B    18-24  Returning   

   UserCount  
0        995  
1        995  
2        995  
3        995  
4        995  


In [14]:
fig_age = px.pie(user_counts_by_age, values='UserCount', names='AgeGroup', title='User Distribution by Age Group')
fig_age.show()


In [15]:
fig_user_type = px.bar(user_counts_multi, x='AgeGroup', y='UserCount', color='UserType', barmode='group',
                       title='User Distribution by Age Group and User Type')
fig_user_type.show()
