Skip to content

gfMateus99/Master_Thesis

Repository files navigation

Master Thesis Repository

Repository with the code for the thesis: Data and computer center prediction of usage and cost: An interpretable machine learning approach.

Thesis objective: Master thesis developed in collaboration with Novobanco. The objective is to use interpretable machine learning models to predict computational usage of the novobanco data center. In addition, we develop a novel method using NLP techniques to explore the impact of human context on novobanco data center usage.

Note: This repository presents only the most important code scripts developed for the objective of this thesis. Some other scripts (for creating the plots, managing parts of data, etc...) are not shown in this repository.

Built With:

Python Jupyter

Organization of this repository

Ensemble Model (Interpretable)

  • EAMDrift - EAMDrift Model
    • Repository with the ensemble Model. Inside this repository is an README.md file explainning how you can use this model with your own data.

Baseline Models

Topic modelling + Sentiment analysis

  • Get Tweets Script - Get_Tweets_Program.py

    • Program to collect tweets via Twitter API (Note: In case of using this script, you need to insert your own Twitter API token keys).
  • Pre-processing text analysis - Pre-processing Text.ipynb

    • Program to pre-process text to Sentiment analysis and topic modelling (Cleaning Text, Tokenization, Reduce Text (Stopwords removal and Remove small words(<=2 characters)), Obtaining the stem words and pos tagging).
  • Sentiment Analysis code - Sentiment Analysis.py

  • Topic Modelling model code - Topic Modelling.ipynb

    • Script created to run topic modelling model (DMM with Gibbs Sampling).

Documents and Reports

Author

Gonçalo Furtado Mateus

License

Copyright © Gonçalo Furtado Mateus, NOVA School of Science and Technology, NOVA University Lisbon, Novobanco.

The NOVA School of Science and Technology, the NOVA University Lisbon and the Novobanco have the right, perpetual and without geographical boundaries, to file and publish this dissertation through printed copies reproduced on paper or on digital form, or by any other means known or that may be invented, and to disseminate through scientific repositories and admit its copying and distribution for non-commercial, educational or research purposes, as long as credit is given to the author and editor.

alt_text   alt_text    

Releases

No releases published

Packages

No packages published