Foundations of Computer Science (FoCS) - Project

Overview | Instructions | Datasets | Conclusion | About me

☍ Overview

Here you can find my solution to a project carried out to complete the examination of a class of the Master's Degree Course in Data Science that I am attending at University of Milano-Bicocca.
The project consists in the resolution of a certain number of exercises that vary according to the number of the workteam, having two datasets available (that can be found in the "Datasets" section).
In my case, I approached the project individually, so I only had to resolve the first 12 exercises.
The various exercises require the correct use of Python, the famous programming language, with all the libraries we decided to use, among which we must mention numpy and pandas, saw during the lectures of the course.
The main purpose of this project is to manipulate the aforementioned datasets and to extract from the data the most varied informations regarding the Google Play Store and its applications.

☍ Instructions

Starting from the Google Play Store dataset, all groups and individuals must do the following:

References	To do list	Team size	Status
Exercise 1	Convert the app sizes to a number	1	✔
Exercise 2	Convert the number of installs to a number	1	✔
Exercise 3	Transform “Varies with device” into a missing value	1	✔
Exercise 4	Convert Current Ver and Android Ver into a dotted number (e.g. 4.0.3 or 4.2)	1	✔
Exercise 5	Remove the duplicates	1	✔
Exercise 6	For each category, compute the number of apps	1	✔
Exercise 7	For each category, compute the average rating	1	✔
Exercise 8	Create two dataframes: one for the genres and one bridging apps and genres. So that, for instance, the app Pixel Draw - Number Art Coloring Book appears twice in the bridging table, once for Art & Design, once for Creativity	1	✔
Exercise 9	For each genre, create a new column of the original dataframe. The new columns must have boolean values (True if the app has a given genre)	1	✔
Exercise 10	For each genre, compute the average rating. What is the genre with highest average?	1	✔
Exercise 11	For each app, compute the approximate income, obtain as a product of number of installs and price	1	✔
Exercise 12	For each app, compute its minimum and maximum Sentiment_polarity	1	✔

Exercise 1	For each app, compute the average number of words in its reviews	2	✖
Exercise 2	For each app, compute its longest review	2	✖
Exercise 3	For each app, compute the ratio between the number of installs and the number of reviews	2	✖
Exercise 4	Cluster the apps according to the major android version (the first two digits — e.g. for 4.0.3 the major version is 4.0)	2	✖
Exercise 5	For each cluster, compute the average date and the last date of an update.	2	✖
Exercise 6	Excluding the free apps, what is the content rating with highest average price?	2	✖

Exercise 1	What is the genre with the highest total income?	3	✖
Exercise 2	What is the genre with the highest fraction of free apps (over the number of all apps)?	3	✖
Exercise 3	For each rating, compute the average income	3	✖
Exercise 4	For each (Content Rating, Genre) pair, compute the number of reviews and the average rating	3	✖

Notes

It is mandatory to use GitHub for developing the project;
The project must be a jupyter notebook;
There is no restriction on the libraries that can be used, nor on the Python version;
Post any question on the Discussions forum.

☍ Datasets

Here you can find the Google Play Store datasets used for the project.
Note that clicking on the link will redirect you to the raw format of the datasets, from which you can download the CSV files you need.

◃ Google Play Store

◃ Google Play Store: User reviews

Too see an example of what you can find on Google Play, click on the image below.

☍ Conclusion

There wasn't a specific goal to reach with this project, so I have not so much to tell as a conclusion.
But surely it is important to say that this course, with this method of examination, gave me the opportunity to become familiar with a language that was new to me and that certainly has an important role in the idea of work that awaits me in the future. Furthermore, extracting information from the datasets always offers the opportunity to reveal some hidden trends in the data, which is important to get a more precise and clear idea about the field of interest, with which you may not be totally confident.

☍ About me

Hi everybody, my name is Alessandro Borroni and I am a Data Science student based in Milan, as the picture up there had already told you.
I have a kind of great passion for Photography and Mathematics.
My previous goal consists in a degree in Business Economics, obtained at University of Milan-Bicocca. Thanks to this degree I developed, inter alia, an interest in Statistics and Finance.

Down here you can find some of my Social Media channels, check them out if you want!

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.ipynb_checkpoints		.ipynb_checkpoints
datasets		datasets
images		images
Foundations of Computer Science - Project (FoCS).ipynb		Foundations of Computer Science - Project (FoCS).ipynb
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Foundations of Computer Science (FoCS) - Project

☍ Overview

☍ Instructions

Notes

☍ Datasets

☍ Conclusion

☍ About me

About

Releases

Packages

Languages

malborroni/Foundations_of_Computer-Science

Folders and files

Latest commit

History

Repository files navigation

Foundations of Computer Science (FoCS) - Project

☍ Overview

☍ Instructions

Notes

☍ Datasets

☍ Conclusion

☍ About me

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages