# Overview:

We are utilizing data from an app called Bloc, which is a communication and engagement tool at a few high schools in St. Louis. The data was taken from the first semester of school (115 days). For this project, we wanted to see how our research and models can help Bloc during implementation processes at other schools, feature engineering for boosted engagement, and uncover trends to help schools and their communities in the future. To complete the project, we are using 4 different datapoints/features in the app:
- Users
- Events
- Media
- Directory

## Final Results:

Our end goal was to create a model that can predict how Bloc might be used/utilized at another school if Bloc were to be implemented. We chose Villa Duchesne because it allowed us to observed patterns from schools with similar characteristics to Villa, providing a basis for making informed predictions. To do this, we created our dummy (fake) data for Villa Duchesne and then created our models for each feature. Doing so allows us to:

- Understanding Potential User Engagement: If Villa doesn't have existing usage data (because Bloc is not yet implemented), creating a simulated dataset based on known patterns from similar schools can help predict how their users might engage with the directory.


- Model Training: Predictive models need data to learn from. If Villa has no historical data, the model can't be trained to predict for Villa specifically. By using dummy data that reflects the behavior at similar institutions, you create a proxy that allows the model to learn patterns that might be applicable to Villa.


- Feature Representation: The dummy data for Villa should have similar feature values to those from the other schools to ensure the model learns a representation that is relevant when making predictions for Villa. It might include similar directory titles, types of content, or any other relevant features.


- Better Generalization: Training on a more extensive dataset that includes both real and dummy data can help the model generalize better, especially when the real data is limited. This can potentially improve the model's accuracy when predicting unknown data.


- Performance Estimation: The Root Mean Squared Error (RMSE) calculated can give an estimate of how well the model might perform when predicting clicks for Villa. This estimate is more reliable when the dummy data closely matches what Villa's actual data might look like.

------------------------------------------------------------------------------------------------------------------------------------------------------------------

# Data Pre-processing

![image.png](attachment:image.png)

The summary statistics reveal the distribution and central tendency of numerical features in the dataset. These statistics offer a comprehensive understanding of usage patterns within the Bloc app, highlighting varying levels of engagement across different features and the distribution of user activity over time.

For instance, when comparing the click counts of different features, such as events and directory listings, understanding their respective standard deviations enables us to contextualize the significance of the differences observed. For example, if one event receives 12 more clicks than another event, and this variance is 1 standard deviation, it suggests a moderate difference. Conversely, if the directory receives 12 more clicks, and this variance represents 2 standard deviations, it indicates a more substantial difference in engagement levels. By leveraging summary statistics, we can effectively interpret and compare the significance of changes across different features within the dataset.

![image.png](attachment:image.png)![image-2.png](attachment:image-2.png)

By displaying the total number of users per school, it provides insights into the size of each school's user base. Additionally, the total and average clicks for different app features, such as events, media, and directory listings, offer a nuanced understanding of user interaction preferences within each school community. Furthermore, the metrics on total and average days logged in per school shed light on the frequency and consistency of app usage among users.

By displaying this information in a consolidated format we are able to do a comparative analysis between schools, revealing trends, patterns, and disparities in app engagement and utilization. For instance we can see that Nerinx Hall, and St. Joseph's average event, media, and directory clicks are roughly the same event with a 135 student difference. We can hypothesis from this that these schools host realtivly the same events that bring tracksion. We can also hypothesis that IWA does not host the event/events that these schools host.

# Exploratory Data Analysis

### Distribution of Total Users Per School

![image.png](attachment:image.png)

We started by exploring the data. First we wanted to see the total distribution of users per school. IWA was 21%, St. Josephs is 36%, and Nerinx hall is 43%. It is important to know the distribution of users between the three schools because we wanted to choose schools that are in the small, medium and large stage. It can also help us later to know which schools get the most engangement per how big of a school they are. If both IWA and Nerinx hall have the same number of media clicks, we will know that the media is more popular at IWA.

### Analysis of Event Clicks Distribution and Days Logged In Distribution

![image.png](attachment:image.png)

The Event clicks Distribution graph shows us that most events will get 0-10 clicks. The graph starts high and slowly tapers off towards the end. This tells us there are very few events that get 40 or more clicks. This information is useful to us because we now know their are not many events that are super popular. We can look at those events and look for corelations between the most popular events, so that we can continue to do those events. We can also see that there are many events that get 0-5 clicks. We can look through these events to see which events we want to continue doing and which events are not worth doing because of the low engagement.

We compare this with the Days logged in Distribution to visually see if the reason that some events are more popular than another is because they happen to be logged in at the time. As you can see there is no correleation between days logged in and event clicks from these graphs. The Graph show us that users are logged in until the last 20 days of the semester. With lots of people looking to go to events from days 30-40, and 85-95. This gives us good knowledge as we can look to schedule events during that time for the most engagement.

### Analysis of the Mean Event Clicks and Days Logged in per school

![image.png](attachment:image.png)

We then looked at the Event clicks per School and the Days loged in per school to see the correlation between the specific school, how many days people are logged in, and how many event clicks the schools gets per day. This visually shows us that there is a linear correlation with the average days logged in and the average amount of clicks each event gets. A Hypothesis from this graph is that inorder to get more engagement/clicks we need to increase the average amount of days logged in. We came to this hypothesis because incarnet words average days logged in was lower than Nerinx.

### Engagement Metrics

![image.png](attachment:image.png)![image-2.png](attachment:image-2.png)

We they wanted to see the correlation between Event clicks and Days logged in per school per each identification group. We wanted to see this distribution because then we can see which group of users is engaged the most. This will help us know if our posts are more catterd towards one identification group or if we are completely missing our target group. These graph show us that at all three schools the amount of event clicks per identification group is roughly the same. Where there is some deviation is total days logged in but they also are all relativly the same.

### Correlation Heatmap

![image.png](attachment:image.png)

We then created a correlation heat map of our selected features. This will show us if the data points to any features being correlated together. The two features with a high correlation is the Event clicks and the Days logged in. This suggests that the users who interact with event more often will also tend to spend more time logged in. This tells us that the best way to increase event clicks is to get user on the app more.

### Correlation Matrix

![image.png](attachment:image.png)

The connections between the different engagement measurements in the dataset are shown by the association heatmap. Surprisingly, there exists a positive connection (0.76) between "Event Clicks" and "Days Logged In," suggesting that individuals who interact with events more often will also tend to spend more time logged in. On the other hand, there is an impression of being less consistency in the link between "Media Clicks" and the various measurements, indicating a weaker relationship. Moreover, "Directory Clicks" have little relationship with other measurements, suggesting that users' interactions with directory-related items might be inconsequential to their interactions with media or events. As a rule, the heatmap assists with analyzing user activity within the platform by offering insights into the interactions between various engagement measurements.

### Media Comparison

![image.png](attachment:image.png)

The results above indicate that, on average, the directory feature receives significantly more clicks compared to media and events, with a standardized score that is 1.41 standard deviations above the mean of the dataset. This suggests that the directory is the most popular feature among the 3 features in Bloc, with users engaging with it more than media or event listings. In contrast, media and events have negative standardized scores of -0.68 and -0.73, respectively, indicating that they are below the average engagement level of this group of features. This metric could imply that users find the directory listings more relevant or useful, leading to higher interaction rates.

This makes sense because it is a consolidated space in the app to link to other software and apps that are helpful to users at a school. An example of that would be having canvas in the directory if Bloc were being used at Maryville.

The media and directory features are both the outlying features of the app. Meaning, the main feature of the app is events since it help users stay informed and connect with their institution and its easier for the institution to consolidate multiple calendars and boost engagement around campus. So, there are two improvements/options we can choose from these results. Either make the entire app look like the directory (image below) or replace media with another feature.

![image.png](attachment:image.png)

### Automation Cost

To automate the event and media posts, Bloc costs around $120 per month. Below is the calculation for finding out how much automation costs now per post now:

- Total Events: 1226
- Total Media: 363
- Total Spend: $0.75 per post

Below is how much automation would cost per post when excluding the media feature:

- Total Spend: $0.16 per post

Bloc would save around $0.59 per post, or decrease spend by around 78.6% without including media in the automation. Based on the high cost of automating media posts, in combination with the lower popularity of the feature, it may be worth looking for another feature to implement and replace media. 

### Popular Directory Titles

![image.png](attachment:image.png)

Above is an image of the top 3 directories utilized at each school. This was a base metric that allows us to see what may be of interest at each school over other categories/topics. It not only shows us the popularity of the top directories, but also shows us what matters most at a school (culture) and what interests users may have at each school.

### Suggested Directories

![image.png](attachment:image.png)

The goal of the image above is to suggest directories that are relevant to a particular school, based on what is popular at other schools with similar content. To do this, we look at the total users at each school combined with the popularity of directories at each school. This allows us to make directory category/topic suggestions to other schools, while also showing which directories should be kept if schools already have that directory.

For example, "Athletics" is a popular directory among the 3 schools, so it would be wise to keep it. "Service Learning" is what Nerinx uses at their school to help students who are struggling in a class, but it is not in St. Joseph's Academy or Incarnate Word Academies school. So, it would be wise to promote it if they do provide the service, or to create something similar since it may be very useful for students and parents.

## A/B Testing

Since the goal of the app is to enhance communication and engagement, we decided to try testing push notifications. This was because, in previous push notifications sent from a school in Bloc, as well as other social apps on the market, push notifications have proved to be a great way to send information directly to the user and get them back into the app to engage with content.

Previously, events and media posts did not have push notifications unless the school sent one manually. For this test, which started on Wednesday (4/17/2024) we turned on push notifications at all 3 schools for events and media. Scheduled notifications were sent out to all users one hour before every event and instant notifications were sent when media was posted.

### Events

For a short review, I will list the events from Incarnate Word below and an overview of St. Joseph's Academy and Nerinx:

Events:
- Fine Arts Knight - Spring Concert/Art Show    | 42 clicks
- Red Knight Summer Camps 2024                  | 59 clicks
- Varsity Lacrosse vs. Notre Dame High School   | 19 clicks
- Varsity Soccer vs. St. Charles West           | 23 clicks
- JV Soccer vs. St. Charles West                | 15 clicks
- Varsity Lacrosse vs. Edwardsville             | 15 clicks
- JV Lacrosse vs. Edwardsville                  |  8 clicks
- Varsity Soccer vs. Villa Duchesne             | 25 clicks
- JV Soccer vs. Villa Duchesne                  | 11 clicks
- Varsity Lacrosse vs. Ursaline Academy         | 22 clicks
- JV Lacrosse vs. Ursaline Academy              | 19 clicks

Events saw an average engagement rate of 23.45. Previously, Incarnate Word Academy's average event engagement rate was 15.91.

We wanted to take a random sample, since we only have a week's worth of data. So, When pulling in 3 random weeks, the results were:
- Week ending August 13, 2023: Average of 14.33 clicks
- Week ending September 10, 2023: Average of 17.55 clicks
- Week ending November 26, 2023: Average of 19.22 clicks

For the random weeks, the total average engagement was 17.03 which is higher than the overall average event engagement rate of 15.91. So, we will take the average engagement of the random sample of weeks since it gives us a better approximation when comparing. We can say that we expect engagement to increase by approximately 37.7% when utilizing push notifications for events at Incarnate Word Academy.

Below is the overview of all schools:
- Incarnate Word Academy | 37.7%
- St. Joseph's Academy   | 24.1%
- Nerinx                 | 22.2%

Based on these results, push notifications, sent an hour before an event, seem to provide a high engagement increase for events.

### Media

I won't go as in depth since there are a lot more media posts, but I will give an example of Incarnate Word Academy again.

We wanted to take a random sample, since we only have a week's worth of data. So, when pulling in 3 random weeks, from the same months as the events to keep it consistent, the results were:

- Week ending August 13, 2023: Average of 14.33 clicks
- Week ending September 3, 2023: Average of 9.30 clicks
- Week ending November 26, 2023: Average of 19.22 clicks

For the random weeks, the total average engagement was 14.29 and the averge for the weekly data was 20.48. We can say that we expect engagement to increase by approximately 43.32% when utilizing push notifications for media at Incarnate Word Academy.

Below is the overview of all schools:

- Incarnate Word Academy | 43.32%
- St. Joseph's Academy | 69.82%
- Nerinx | 50.15%

Based on these results, push notifications, sent when a media post is created, seem to provide a high engagement increase for media posts.

### Conclusion

Since there is a large increase in engagement when events and media are supplemented with push notifications, there are two ways we can maintain or possibly increase engagement. One would be just for events, where we can test different times the notification is sent relative to an event. So, rather than one hour before, we could do 30 minutes before, or an hour and a half before, etc. The second option is to allow users to customize their own notification preferences. Turning them off/on, setting a scheduled notification, conditionals - such as when an event is favorited, etc.

I think push notifications can greatly increase the amount of engagement and time spent in Bloc at all schools.

------------------------------------------------------------------------------------------------------------------------------------------------------------------

# Predictive Model:

### Comparasion of Event Clicks

![image.png](attachment:image.png)

The Comparison of Event Clicks between Villa Duchesne and the other schools gives us insights into how the generated dummy data looks compared to the other schools. By generating this comparison we can conclude that our dummy data is properly distributed and is a good representation of the other three schools. This is important to know to ensure that our data is properly created, and will represent the demographics at Villa accurately.  

![image.png](attachment:image.png)

This information shows us the average amount of clicks for events, media, directory and shows us the average days logged in. This data shows us that Villa is predicted to be closely related to Nerinx and St. Joseph's. We can take this data to Villa and show them that we predict these results if you take a similar approach as Nerinx and St. Joseph's. To impove upon these results we wil show the most popular events that these other schools put on later in the notebook. 

### Model for Events

![image.png](attachment:image.png)

The plot above shows the actual versus predicted Event clicks based on our predictive model we trained. The diagonal red line represents where the predicted values would lie if they were perfect predictions, closely matching the actual clicks. The scatter of blue points illustrates how the predictions vary compared to the actual values.

So, the data and plot indicate the model can relatively predict the data accurately.

### Model for Media

![image.png](attachment:image.png)

### Villa Duchense Automation Cost

To automate the event and media posts, Bloc costs around $120 per month. Below is the calculation for finding out how much automation costs now per post now:

- Total Events: 1226
- Total Media: 363
- Total Spend: $0.75 per post

Below is how much automation would cost per post when adding Villa's data:

- Total Villa Events: 409
- Total Villa Media: 121
- Total Spend: $0.56 per post

When implementing Villa, Bloc would save around $0.19 per post, or decrease spend by around 25.3% by automating events and media. Originally, the conclusiong was, based on the high cost of automating media posts, in combination with the lower popularity of the feature, it may be worth looking for another feature to implement and replace media. However, it may be worth keeping media and trying to expand to other schools to reduce the cost.

### Model for Directory

![image.png](attachment:image.png)

The plot above shows the actual versus predicted direcctory clicks based on our predictive model we trained. The diagonal red line represents where the predicted values would lie if they were perfect predictions, closely matching the actual clicks. The scatter of blue points illustrates how the predictions vary compared to the actual values.

So, the data and plot indicate the model can relatively predict the data accurately.

![image.png](attachment:image.png)

We wanted to analyze the effectiveness of different directories at the 3 schools by considering not just the raw number of clicks each directory receives, but the number of clicks relative to the number of users at each school. This provides a more normalized measure of directory popularity/engagement, as it accounts for the different sizes of the user bases at each school.

The results give us recommendations that can benefit the effictiveness of the directory feature in Bloc at Villa and will greatly reduce implementation time for this feature.