Skip to content

A quick summary about what is data science and where to start

Notifications You must be signed in to change notification settings

gigigrin/About_DataScience

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Data Science Overview:

What is Data Science?

Data science is the study of data. It involves developing methods of recording, storing, and analyzing data to effectively extract useful information. The goal of data science is to gain insights and knowledge from any type of data — both structured and unstructured. Data science is related to computer science, but is a separate field. Computer science involves creating programs and algorithms to record and process data, while data science covers any type of data analysis, which may or may not use computers. Data science is more closely related to the mathematics field of Statistics, which includes the collection, organization, analysis, and presentation of data. Because of the large amounts of data modern companies and organizations maintain, data science has become an integral part of IT. For example, a company that has petabytes of user data may use data science to develop effective ways to store, manage, and analyze the data. The company may use the scientific method to run tests and extract results that can provide meaningful insights about their users.

In simple words data science is the process of collecting and analyizing large amounts of data by using statistics, mathematics and machine learning. By applying these methods a data scientist can discover new information and trends in the data. The usage of data science can vary, from helping a company to gain marketing advantages and solving world wide problems

What you should know before starting

Data Science is a huge field that has an endless amount of information to learn and discover, But! As any other field out there it has its own basic beginner steps that everyone should follow before diving deeper into it:

Step 1: Get comfortable with Python:

  • Although there are many programming languages used in the field of data science, I recommend using Python as it's popularity in the industry tends to be more and more popular.
  • focus on learning one language and its ecosystem of data science packages. (consider installing the Anaconda distribution because it simplifies the process of package installation and management on Windows, OSX, and Linux)
  • You don't need to become a Python expert to move on to step 2. Instead, you should focus on mastering the following: data types, data structures, imports, functions, conditional statements, comparisons, loops, and comprehensions. Everything else can wait until later!
  • Get comfortable with: Jupyter notebook & Spyder IDEs (my recommendation)

Sources:

  • Sentdex (python tutorial): Here

Step 2: Statistics

Having basic knowledge in statistics you can easily know how to work with different types of data, how to plot and visualize different types of data, calculate correlation and covariance correlation and covariance and make data driven decisions

Sources:

Highly recommend this course on udemy and also this youtube channel

Step 3: Learn data analysis, manipulation, and visualization

  • For working with data in Python, you should learn how to use the pandas library. pandas provides a high-performance data structure (called a "DataFrame") that is suitable for tabular data with columns of different types, similar to an Excel spreadsheet or SQL table. It includes tools for reading and writing data, handling missing data, filtering data, cleaning messy data, merging datasets, visualizing data, and so much more. In short, learning pandas will significantly increase your efficiency when working with data.
  • For data plotting and visualization I suggest learning the matplotlib and seaborn libraries as they contain many different types of plots and visualization structures.

Sources:

  • Sentdex (pandas tutorial): Here

Step 4: Learn Machine Learning

**This is the most facinating and intresting part in every data science project! And because of this reason I will give you the opportunity to explore, discover and get facinated on your own.

Sources:

Step 5: Start getting your hands dirty

As theoretical knowledge is great and facinating, there is nothing more facinating than applying theory into practice and expanding your knoweldge!

  • Start working on your own projects
  • Explore Deep Learning (More advanced field of machine learning)
  • Explore other people's project
  • Read related articles

Before I end this tutorial I want to give you one more tip: LEARN TO GOOGLE!. Google and Documentation are the Data Scientist's best friends. As you work on your projects you'll encounter many difficulties and problems that the toturials wont cover. Also remember that as a Data Scientist you need to focus less on "remembering" the code and more on solving/answering questions in the best way possible. Don't be afraid to copy and use someone else's code (as long as it helps you). ALWAYS GIVE CREDIT IF YOU USE SOMEONE ELSE'S CODE/RESOURCES!

I hope this little toturial helped you and goodluck!

About

A quick summary about what is data science and where to start

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published