Skip to content

heba14101998/Data-Science-with-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Python Notebooks for Data Science


This repository offers a comprehensive collection of Python notebooks that serve as a detailed guide to key topics in statistics and machine learning for data science applications.

Each notebook includes clear explanations, practical examples, and hands-on coding exercises to reinforce learning. Designed for beginners and learners at any level, these resources cover foundational concepts, step-by-step implementations from scratch, and real-world applications using Scikit-learn.

Whether you're new to data analysis or looking to enhance your machine learning skills, this repository provides a structured and practical approach to mastering these techniques.

Used Datasets

This repository includes small to medium-sized datasets in the datasets directory, but larger datasets are excluded due to size constraints. For these, source URLs are provided in text file that has the same name of the dataset. Download the missing datasets from the provided URLs and place them in the datasets directory to ensure the code runs properly. For any issues, feel free to open an issue.

Contents

The notebooks are organized into the following categories:

[101] Data Exploration and Preprocessing: All notebooks starts with 101 code is the machine leaning and data science.

  • 101-01-linear-algebra.ipynb: Covers the fundamentals of linear algebra, including vectors, matrices, and operations.
  • 101-02-descriptive-statistics.ipynb: Covers descriptive statistics, data visualization, and understanding key metrics for data analysis.
  • 101-03-probability-distributions.ipynb: Introduces basic probability theory, random variables, distributions.
  • 101-04-T-testing-and-error-types: Explains hypothesis testing, confidance interval, t-tests, and how to perform them in Python.

[202] Feature Engineering and Selection: All notebooks starts with 202 code is the machine leaning and data science.

  • 202-01-feature-selection-part1.ipynb: Assessing the relationship between features and the target variable. As well as a comprehensive guide to various methods for feature selection, including filter, wrapper, and embedded techniques.

  • 202-02-feature-selection-part2.ipynb: Explores feature engineering techniques to improve model performance, covers different feature selection methods, and demonstrates their implementation.

  • 202-03-feature-extraction-(pca-t.sne-umap).ipynb [underdevelopment]: a comprehensive guide to dimensionality reduction techniques, specifically focusing on PCA, t-SNE, and UMAP. It includes theoretical explanations, step-by-step implementations from scratch, and practical applications using Scikit-learn.

[303] Machine Learning Algorithms:

  • 303-01-linear-regression.ipynb: Introduces simple, multiple linear regression models, and polynomial regression. Discussion on linear regression assumptions
  • 303-02-regression-analysis-and-glmm.ipynb: Expands on regression analysis, including generalized linear models (GLMs).
  • 303-03-logistic-regression.ipynb: Covers binary classification using logistic regression, with various performance metrics using this book as reference.
  • 303-04-support-vector-machine-svc.ipynb: Explores Support Vector Machines (SVMs), a versatile algorithm for both classification and regression tasks.
  • 303-05-support-vector-machine-svr.ipynb: comming soon
  • 303-06-descion-tree.ipynb: comming soon

Getting Started

Clone the repository:

git clone https://github.com/heba14101998/Hands-on-Data-with-Python.git

Usage

To use these notebooks, you'll need Python along with common data science libraries like NumPy, Pandas, Matplotlib, Scikit-Learn, etc. Each notebook has the necessary import statements. You can view the notebooks statically on GitHub. However, to modify code and re-run, you'll want to clone or download the repository locally.

Acknowledgments

I’d like to acknowledge Hamel, G.'s fantastic work on Kaggle. Their resource provided valuable insights and inspired parts of my notebooks.

Contributions

While I aim to ensure performance and code quality, these notebooks may have errors. If you spot any issues or have suggestions to improve the content, please feel free to submit a pull request or open an issue.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published