Algorithmic Methods of Data Mining (Sc.M. in Data Science) Academic year 2023–2024. Homewok 2 The repository consists of the following files:
-
HW2_Finale.ipynb
:A Jupyter notebook which provides the solutions to all research questions.
-
Research questions [RQs]
1. [RQ1] Exploratory Data Analysis (EDA) 2. [RQ2] Retrieving some vital information from the dataset 3. [RQ3] Historical look at the dataset 4. [RQ4] Quirks questions about consistency 5. [RQ5] Analysis of the influential authors with the most fans 6. [RQ6] Deeper analysis of the top 10 authors concerning the number of fans 7. [RQ7] Estimating probabilities 8. [RQ8] Charts, statistical tests, and analysis methods
-
Bonus Questions
1. Using of an alternative library to Pandas 2. Text-mining analysis of the books.json'description and authors.json'field
-
Command Line Question (CLQ)
-
AWS Question
-
Algorithmic Question (AQ)
-
-
commandline_original.ps1
:A script that reports the title of the top 5 series with the highest total 'books_count' among all of their associated books using command line tools. (CLQ1)
-
commandline_LLM.ps1
:A more robust script for the same purpose implemented by an LLM (CLQ2)
-
AWSQ
:A folder containing the python code and the cvs for the AWSQ