Skip to content

Latest commit

 

History

History
142 lines (118 loc) · 6.48 KB

README.md

File metadata and controls

142 lines (118 loc) · 6.48 KB

Guide for Data Science Projects in 2023

This repository explains on defining a problem statement and performing necessary tasks to provide meaningful results. A walkthrough on a structured approach to ensure a successful completion of a data science project.

How to start?

Find/Define a problem statement: The problem statement is a critical first step in any data science project, providing a clear definition of the problem to be solved and guiding the development of research questions and hypotheses to reach a well-defined solution.

How to find the scope of the problem, and where to begin?

One approach I personally use to start and solve any case scenario is to use the below method and structure a solution accordingly. image

  • Clarify:
  • Define the scope of the problem
  • Constrain:
  • Refine the problem by setting boundaries and parameters
  • [x]Plan:
  • Frame your response
  • Data Gathering solutions
  • Collaboration solutions if needed
  • Find the various Statistical analysis that would be implemented
  • Method:
  • Data Quality Inspection and EDA
  • Data Pre-processing and Feature Engineering (Data Cleaning, Data Integration, Data Reduction, Data Transformation & Data Reduction)
  • Feature Selection Methodologies
  • Model Selection & Creation
  • Model Evaluation and Hyper Parameter Tuning
  • Result Visualization
  • Model Deployment
  • Conclude:
  • Provide a conclusion by explaining the problem statement’s significance and decisions to be taken based on the results

Stages involved in solving a typical data science problem

The links provided in this readme file are crucial for obtaining detailed explanations and learning more concept about the concepts/bag of keywords mentioned. Please make sure to follow the links to gain a deeper understanding of the topic

Data science workflow

Data science workflow

Data Analysis

Data Analysis

  • Feature Engineering:

    • Perform appropriate data analysis as mentioned above to get a meaningful set of attributes

Data Analysis

Data Analysis

  • Model Deployment:

  • Once the trained model is ready, deploy it for corresponding use case scenarios. Further monitor the model and retrain it respectively

Local deployment

  • Locally used on machine or server. Can also be accessed using frameworks such as:
    • Streamlit
    • Django
    • Flash
    • Express.JS

Cloud deployment

  • Amazon Web Services (AWS)
  • Google Cloud Platform (GCP)
  • Microsoft Azure

Containerization

  • Docker
  • Kubernetes

Using API's

  • Frameworks like Flask, Express.JS, FastAPI

Prerequisite needed to solve problems:

Below are few prerequisites that I personally consider to be very useful to solve a use case scenario Explore the packages by clicking on the accompanying images for detailed information on installation and usage instructions

1. Python Programming Language

Python Logo

2. Statistics

Statistics Logo

3. Databases

mysql Logo mongodb Logo

4. Visualization tools

tableau Logo

References:

Textbooks:

  • Introduction to Algorithms for Data Mining and Machine Learning(Xin-She Yang)
  • Introduction to Statistical Learning
  • Making Sense of Data II: A Practical Guide to Data Visualization, Advanced Data Mining Methods, and Applications
  • DATA SCIENCE INTERVIEW GUIDE ACE-PREP

Coursework / Tutorials: