Skip to content
View pergran1's full-sized avatar
Block or Report

Block or report pergran1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pergran1/README.md

welcome

About me

I am interested in learning and develop more into data engineering and database management! I have a great understanding on how to analyze and manage data using Python, R, SQL and Excel VBA.

But the things that I have a passion for now is ETL pipelines, data in the cloud and learning Scala and Java.

  • One of the big projects I have been working on lately is to develop an entire ETL pipeline by web scraping Fanfiction data and store the data in a database. I have been using Redshift in AWS but also a PostgreSQL database so I can try different approaches on each database. I also used Airflow and AWS Lambda to automatically download the data.

  • 📖 I’m currently learning AWS, Glue, Scala and Java


Most used languages on Github:

Top Langs

Fanfiction project with Airflow and PostgreSQL and AWS

This is a very fun project that showcase some of my knowledge about database, sql, Airflow and AWS. A short summery is that I web scrape fanfiction data and calculate KPIs using two different methods.

This project can be viewed here: Fanfiction project

194145272-c8a45e6e-da82-4fe9-98ee-2ad1499fa59e


Endpoint for dirty text classification and Streamlit app

Another side project I did using the fanfiction data was to create a classification endpoint and Streamlit app that classify if a text is sexual, which hypothetical could be used to prevent children from reading messages or texts that seems to contain sexual meaning.

The endpoint is here: endpoint classification

The streamlit app is here: Streamlit app for text classification


Dashboards made using R and Flexdashboards

The two links below are to dashboards created in R 👨‍💻 using the packages Shiny and Flexdashboard. I have a huge experience with creating dashboards with Superset, Qlik and data studio

tax

One of my first workplaces was at the Swedish tax agency. I was interested in extracting how the Swedish people write about the tax agency, would it be with hate? love? satisfaction? scandals?

I used Python and R in this project. Python was used to download around 42 000 old tweets that mentions “Skatteverket”, I was able to do this by using the package GetOldTweets3 in Python.

I used R and the package Rtweet to download data from the official account of Skatteverket. Rtweet can download every tweet from a twitter account and was therefore perfect to use when analyzing the account of Skatteverket.


One of friends was active in figure skating and I noticed that he was on a website that have much data concerning multiple figure skating competitions and competitors. So, I figured it would be fun to use Python to scrape the data and visualize it in a dashboard.

This project is very sentimental to me because it was the first one where I used Python to web scrape and I remember how fun it was to see and collect data from a website! So, this is where it all started ☺️


Projects analyzing data with R

These are some of the projects that I made some years ago in R. These projects bring back fun memories. The links below takes you to my external website so you can see the charts and data visualization, but all the code is stored here on Github: R projects code

Lund University and the data of published theses
The fantastic and fun Lund University was where I studied my master’s in finance and statistics. In this project I collected the data from this website which have every theses made by students. I analyzed topics such as how many papers are published each year, how many authors are there, which faculty produces the most papers etc. The plots and project is not refined, I was going to go back and make it better but that never happened.

Trends and Analysis of Suicide from WHO
Here I analyzed public data on suicides taken from WHO. It was one interesting topic which is always relevant. Learned a lot about making plots.

Analyzing the Rise and Trends of Japanese Anime
By using public data from Kaggle I analyzed data concerning users that watch anime, which is Japanese cartoons.

Who Plays and Makes Levels in the Nintendo Game Mario Maker?
I grow up playing Nintendo games so of course I had to analyze data for the game Mario Maker, which lets the players create their own levels! This was a very fun exercise with data wrangling

Classifying bank customers with Log, Net and Tree
This was a short project made in Lund University, my task was to try different approaches to classify customers that would default loans. The data is public and was taken from Kaggle.

Popular repositories Loading

  1. LuleaJavaCourses LuleaJavaCourses Public

    Java 1

  2. Flashback-scraper Flashback-scraper Public

    Jupyter Notebook

  3. Fanfiction-analys-and-web-scrapeing Fanfiction-analys-and-web-scrapeing Public

    Jupyter Notebook

  4. Functions-and-practice Functions-and-practice Public

    Jupyter Notebook

  5. Mario-Maker-analysis-in-python Mario-Maker-analysis-in-python Public

    Jupyter Notebook

  6. Streamlit-app-for-text-classification Streamlit-app-for-text-classification Public

    Python