Either by following the link on my personal introduction page or through other means, you have reached my public repository in which I store my college course / personal practices on certain data science topics. If you'd like to learn more about what I used each script for, you can continue reading this README which has a short description, and a link, for each script in this repository.
Most of the code here is not complete, as they are form the projects I am currently working on. Examples: Dependency parsing, BERT models, semtiment analysis models, Named Entity Recognition, and analysis scripts.
- In this project, I used decision tree and random forest classifiers to classify Spotify song genres.
- In this project, I use linear, lasso, ridge regression models, decision tree and random forest regressors, and SVM regression to predict the quality of white and red wines.
- In this project, we, as a group, investigated the hate crime data across multiple official data sources.
- In this report, I looked into the demographics of food access (income, distance, age, race...), the geographic distriburion, and its correlation with poverty. Moreover, I proposed a simple solution. The project was for a data science course, and there was a page limit. That's why all the graphs are clumped together. Also, the main focus of the project was to investigate and not visualize.
- In this project, it was asked from us to create two visualizations, yet one of them had to obscure the facts in the data, and the other visualization should have represent the data truthfully.
- In this report, I analyzed and visalized Disney's movie patterns and tried to explain why we see a change in their movie patterns.
- Solving the enigma and knapsack problems using simple Python code
- Practicing with simple linear regression (small number of features)
- Practicing simple linear regression, diving deep into the sklearn libraries
- Practicing using more complex regression models and choosing the best fit model by looking at regression metrics
Individual or class projects conducted on Python, R, HTML, and SQL
- This notebook demonstrates the basic technologies of doing data analysis for different data types, including: Data overview (understanding column data types, values, and distributions), Data cleaning (remove missing values, outlier detection, Data transformation (normalization, tokenization, lemmatization), Feature engineering (encoding categorical data, text feature representation), Understanding the interactions between columns (colinearty examination)
- Basic SQL practice using a simple dataset
- Practice with dictionaries using basic built-in Python commands (Given a string printing the word count, given a dictionary, find the frequency of specified strings, finding panagrams and missing letters, practicing list comprehensions, practicing calling the dictionary keys and values)
- Practicing basic Python classes and building trees using classes
- Practicing tuples and classes. Using two different datasets, created a class to match the common values in the two datasets by calcualting a match / unmatch probability
- Practicing with the BeautifulSoup package. Created a web page crawler: the code starts with an initial link from the University of Chicago course catalogue and creates course and relative key word pairings from the course's title and description. Code continues until completing every course listed by opening the relative links found on the initial and crawled webpages.
- Re-creating the University of Chicago course registration system by practicing Python classes
- Practicing with Pytorch packages and Bert models
- Practicing with Bert Base Uncased (freezing some of its layers) and basic NLP models
- Practicing data visualization with Seaborn, Altair, and Matplotlib