Data science is the study of data. It involves developing methods of recording, storing, and analyzing data to effectively extract useful information. The goal of data science is to gain insights and knowledge from any type of data — both structured and unstructured. Data science is related to computer science, but is a separate field. Computer science involves creating programs and algorithms to record and process data, while data science covers any type of data analysis, which may or may not use computers. Data science is more closely related to the mathematics field of Statistics, which includes the collection, organization, analysis, and presentation of data. Because of the large amounts of data modern companies and organizations maintain, data science has become an integral part of IT. For example, a company that has petabytes of user data may use data science to develop effective ways to store, manage, and analyze the data. The company may use the scientific method to run tests and extract results that can provide meaningful insights about their users.
In simple words data science is the process of collecting and analyizing large amounts of data by using statistics, mathematics and machine learning. By applying these methods a data scientist can discover new information and trends in the data. The usage of data science can vary, from helping a company to gain marketing advantages and solving world wide problems
Data Science is a huge field that has an endless amount of information to learn and discover, But! As any other field out there it has its own basic beginner steps that everyone should follow before diving deeper into it:
- Although there are many programming languages used in the field of data science, I recommend using Python as it's popularity in the industry tends to be more and more popular.
- focus on learning one language and its ecosystem of data science packages. (consider installing the Anaconda distribution because it simplifies the process of package installation and management on Windows, OSX, and Linux)
- You don't need to become a Python expert to move on to step 2. Instead, you should focus on mastering the following: data types, data structures, imports, functions, conditional statements, comparisons, loops, and comprehensions. Everything else can wait until later!
- Get comfortable with: Jupyter notebook & Spyder IDEs (my recommendation)
- Sentdex (python tutorial): Here
Having basic knowledge in statistics you can easily know how to work with different types of data, how to plot and visualize different types of data, calculate correlation and covariance correlation and covariance and make data driven decisions
Highly recommend this course on udemy and also this youtube channel
- For working with data in Python, you should learn how to use the pandas library. pandas provides a high-performance data structure (called a "DataFrame") that is suitable for tabular data with columns of different types, similar to an Excel spreadsheet or SQL table. It includes tools for reading and writing data, handling missing data, filtering data, cleaning messy data, merging datasets, visualizing data, and so much more. In short, learning pandas will significantly increase your efficiency when working with data.
- For data plotting and visualization I suggest learning the matplotlib and seaborn libraries as they contain many different types of plots and visualization structures.
- Sentdex (pandas tutorial): Here
**This is the most facinating and intresting part in every data science project! And because of this reason I will give you the opportunity to explore, discover and get facinated on your own.
- Machine Learning Mastery: IMPORTANT
- Sentdex: ML Tutorial
- Udemy course: ML course
- Coursera (free): ML course
As theoretical knowledge is great and facinating, there is nothing more facinating than applying theory into practice and expanding your knoweldge!
- Start working on your own projects
- Explore Deep Learning (More advanced field of machine learning)
- Explore other people's project
- Read related articles
Before I end this tutorial I want to give you one more tip: LEARN TO GOOGLE!. Google and Documentation are the Data Scientist's best friends. As you work on your projects you'll encounter many difficulties and problems that the toturials wont cover. Also remember that as a Data Scientist you need to focus less on "remembering" the code and more on solving/answering questions in the best way possible. Don't be afraid to copy and use someone else's code (as long as it helps you). ALWAYS GIVE CREDIT IF YOU USE SOMEONE ELSE'S CODE/RESOURCES!
I hope this little toturial helped you and goodluck!