Skip to content

A repository for the Foundations of Computer Science course project.

Notifications You must be signed in to change notification settings

malborroni/Foundations_of_Computer-Science

Repository files navigation



Foundations of Computer Science (FoCS) - Project

Overview   |   Instructions   |   Datasets   |   Conclusion   |   About me  


☍   Overview

Here you can find my solution to a project carried out to complete the examination of a class of the Master's Degree Course in Data Science that I am attending at University of Milano-Bicocca.
The project consists in the resolution of a certain number of exercises that vary according to the number of the workteam, having two datasets available (that can be found in the "Datasets" section).
In my case, I approached the project individually, so I only had to resolve the first 12 exercises.
The various exercises require the correct use of Python, the famous programming language, with all the libraries we decided to use, among which we must mention numpy and pandas, saw during the lectures of the course.
The main purpose of this project is to manipulate the aforementioned datasets and to extract from the data the most varied informations regarding the Google Play Store and its applications.

☍   Instructions

Starting from the Google Play Store dataset, all groups and individuals must do the following:

References To do list Team size Status
Exercise 1 Convert the app sizes to a number 1
Exercise 2 Convert the number of installs to a number 1
Exercise 3 Transform “Varies with device” into a missing value 1
Exercise 4 Convert Current Ver and Android Ver into a dotted number (e.g. 4.0.3 or 4.2) 1
Exercise 5 Remove the duplicates 1
Exercise 6 For each category, compute the number of apps 1
Exercise 7 For each category, compute the average rating 1
Exercise 8 Create two dataframes: one for the genres and one bridging apps and genres. So that, for instance, the app Pixel Draw - Number Art Coloring Book appears twice in the bridging table, once for Art & Design, once for Creativity 1
Exercise 9 For each genre, create a new column of the original dataframe. The new columns must have boolean values (True if the app has a given genre) 1
Exercise 10 For each genre, compute the average rating. What is the genre with highest average? 1
Exercise 11 For each app, compute the approximate income, obtain as a product of number of installs and price 1
Exercise 12 For each app, compute its minimum and maximum Sentiment_polarity 1
Exercise 1 For each app, compute the average number of words in its reviews 2
Exercise 2 For each app, compute its longest review 2
Exercise 3 For each app, compute the ratio between the number of installs and the number of reviews 2
Exercise 4 Cluster the apps according to the major android version (the first two digits — e.g. for 4.0.3 the major version is 4.0) 2
Exercise 5 For each cluster, compute the average date and the last date of an update. 2
Exercise 6 Excluding the free apps, what is the content rating with highest average price? 2
Exercise 1 What is the genre with the highest total income? 3
Exercise 2 What is the genre with the highest fraction of free apps (over the number of all apps)? 3
Exercise 3 For each rating, compute the average income 3
Exercise 4 For each (Content Rating, Genre) pair, compute the number of reviews and the average rating 3

Notes

  1. It is mandatory to use GitHub for developing the project;
  2. The project must be a jupyter notebook;
  3. There is no restriction on the libraries that can be used, nor on the Python version;
  4. Post any question on the Discussions forum.

☍   Datasets

Here you can find the Google Play Store datasets used for the project.
Note that clicking on the link will redirect you to the raw format of the datasets, from which you can download the CSV files you need.

    ◃   Google Play Store

    ◃   Google Play Store: User reviews

Too see an example of what you can find on Google Play, click on the image below.

☍   Conclusion

There wasn't a specific goal to reach with this project, so I have not so much to tell as a conclusion.
But surely it is important to say that this course, with this method of examination, gave me the opportunity to become familiar with a language that was new to me and that certainly has an important role in the idea of work that awaits me in the future. Furthermore, extracting information from the datasets always offers the opportunity to reveal some hidden trends in the data, which is important to get a more precise and clear idea about the field of interest, with which you may not be totally confident.

☍   About me

Hi everybody, my name is Alessandro Borroni and I am a Data Science student based in Milan, as the picture up there had already told you.
I have a kind of great passion for Photography and Mathematics.
My previous goal consists in a degree in Business Economics, obtained at University of Milan-Bicocca. Thanks to this degree I developed, inter alia, an interest in Statistics and Finance.

Down here you can find some of my Social Media channels, check them out if you want!

 


Releases

No releases published

Packages

No packages published