Repository with the code for the thesis: Data and computer center prediction of usage and cost: An interpretable machine learning approach.
Thesis objective: Master thesis developed in collaboration with Novobanco. The objective is to use interpretable machine learning models to predict computational usage of the novobanco data center. In addition, we develop a novel method using NLP techniques to explore the impact of human context on novobanco data center usage.
Note: This repository presents only the most important code scripts developed for the objective of this thesis. Some other scripts (for creating the plots, managing parts of data, etc...) are not shown in this repository.
Built With:
- EAMDrift - EAMDrift Model
- Repository with the ensemble Model. Inside this repository is an README.md file explainning how you can use this model with your own data.
- Exponential Smoothing - Baseline_Models-ExponentialSmoothing.py
- Long short-term memory (LSTM) - Baseline_Models-LSTM.py
- Prophet - Baseline_Models-Prophet.py
- Seasonal Autoregressive Integrated Moving Average (SARIMA) - Baseline_Models-SARIMA.py
- Transformer - Baseline_Models-Transformer.py
-
Get Tweets Script - Get_Tweets_Program.py
- Program to collect tweets via Twitter API (Note: In case of using this script, you need to insert your own Twitter API token keys).
-
Pre-processing text analysis - Pre-processing Text.ipynb
- Program to pre-process text to Sentiment analysis and topic modelling (Cleaning Text, Tokenization, Reduce Text (Stopwords removal and Remove small words(<=2 characters)), Obtaining the stem words and pos tagging).
-
Sentiment Analysis code - Sentiment Analysis.py
- Dictionary-based sentiment analysis using SentiLex-PT and EMOTAIX.PT dictionaries.
-
Topic Modelling model code - Topic Modelling.ipynb
- Script created to run topic modelling model (DMM with Gibbs Sampling).
Gonçalo Furtado Mateus
- Github - github/gfMateus99
- Email - goncalomateus99@gmail.com
- LinkedIn - https://www.linkedin.com/in/gonçalo-mateus/
Copyright © Gonçalo Furtado Mateus, NOVA School of Science and Technology, NOVA University Lisbon, Novobanco.
The NOVA School of Science and Technology, the NOVA University Lisbon and the Novobanco have the right, perpetual and without geographical boundaries, to file and publish this dissertation through printed copies reproduced on paper or on digital form, or by any other means known or that may be invented, and to disseminate through scientific repositories and admit its copying and distribution for non-commercial, educational or research purposes, as long as credit is given to the author and editor.