This repository contains a collection of Python-based projects focusing on Natural Language Processing (NLP) and Advanced Data Analytics.
This project analyzes the linguistic proximity between 12 distinct Pashto dialects.
- Algorithm: Uses custom Levenshtein Edit Distance to calculate word similarity.
- Visualizations: Generates a Hierarchical Clustering Dendrogram and a geographic similarity map using Matplotlib and SciPy.
A comprehensive data science workflow for cleaning and analyzing weather event datasets.
- Data Cleaning: Automated handling of missing values and datetime feature engineering.
- Analysis: Includes outlier detection via IQR, correlation heatmaps, and geospatial event mapping.
- Encoding: Uses Label Encoding for categorical variables like 'Severity'.