Skip to content

gitmystuff/DSRoadmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DS Roadmap

Resources for a Data Science Roadmap

Project Overview

This project offers a comprehensive glossary of terms and definitions, organized by modules, to aid in the exploration and analysis of key concepts. The glossary is available as a CSV file, which can be utilized in both spreadsheet software and Python notebooks, providing flexibility and advanced functionalities.

Key Features

HTML Viewer: Click on html_preview_links.md to gain access to each module overview that has been used for this project. Code can be located in the the relative class depository.

Spreadsheet Usage: The CSV file can be seamlessly integrated with spreadsheet software like Google Sheets or Microsoft Excel. This enables users to effortlessly view, sort, and filter the glossary by terms, modules, and definitions, facilitating quick access to specific information.

Python Notebook Integration: For users seeking more advanced functionalities, the project provides a Python notebook that offers:

  • Sorting: Sort the glossary by terms or modules, enabling customized organization of the data.
  • Similarity Search: Leverage pre-calculated vector embeddings to identify semantically similar terms. This feature allows users to explore related concepts and discover connections within the glossary.
  • Visit Modules/Glossary.ipynb for examples
  • Example Usage:
    • Go to Modules/Glossary.ipynb
    • Click on Open in Colab
    • Go to Runtime...Run all
    • Scroll to tbe bottom and wait for LLM to load
    • Enter a similarity search term in the prompt when it becomes available

Additional Resources

For data science enthusiasts seeking further learning and development opportunities, the project also includes a curated list of resources:

General Data Science Topics:

freeCodeCamp:

YouTube:

Machine Learning:

Deep Learning:

Additional Notes

  • Installation: Ensure you have the necessary Python libraries installed, including pandas, scikit-learn, and transformers.
  • Model: The Python notebook might utilize a specific pre-trained sentence transformer model. Feel free to experiment with different models from the Hugging Face Model Hub to observe their impact on the similarity search results.
  • Customization: Tailor the code and parameters, such as the number of similar terms to return, to align with your specific requirements and preferences.

This project offers a versatile and robust approach to exploring and analyzing glossary data. By combining the user-friendliness of a spreadsheet with the advanced capabilities of Python and sentence embeddings, users can gain a deeper understanding of terms, their definitions, and their relationships within the context of data science.

About

Resources for a Data Science Roadmap

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published