# Data Science Tools and Ecosystem

**Objectives:**

- List popular languages for Data Science.
- Introduce commonly used libraries by Data Scientists.
- Explore essential Data Science tools.
- Provide examples of evaluating arithmetic expressions in Python.
- Convert minutes to hours in Python.


## In this notebook, Data Science Tools and Ecosystem are summarized.

### Some of the popular languages that Data Scientists use are:

Data scientists use a variety of programming languages and tools depending on their specific tasks and preferences. Some of the popular languages that data scientists commonly use include:

1. **Python**: Python is perhaps the most popular language for data science. It has a rich ecosystem of libraries and frameworks like NumPy, pandas, scikit-learn, TensorFlow, and PyTorch that make it versatile for data analysis, machine learning, and deep learning tasks.

2. **R**: R is a language specifically designed for statistical analysis and data visualization. It's widely used in academic and research settings and has a strong community of statisticians and data scientists.

3. **SQL**: SQL (Structured Query Language) is essential for working with relational databases. Data scientists often use SQL to extract, transform, and analyze data stored in databases.

4. **Julia**: Julia is an emerging programming language that offers high performance for numerical and scientific computing. It's gaining popularity among data scientists for tasks that require intensive computations.

5. **Scala**: Scala is used in conjunction with Apache Spark, a popular big data processing framework. Data scientists often write Spark applications in Scala for large-scale data processing and analysis.

6. **Java**: Java is used in various data science applications, particularly in big data processing and building large-scale data pipelines. Libraries like Deeplearning4j also make Java suitable for machine learning tasks.

7. **MATLAB**: MATLAB is widely used in academia and industry for numerical analysis and scientific computing. It has a comprehensive set of tools for data analysis and visualization.

8. **SAS**: SAS (Statistical Analysis System) is a software suite commonly used in industries like healthcare, finance, and government for advanced analytics and statistical modeling.

9. **Haskell**: Haskell is sometimes used in data science, particularly for tasks that require functional programming and a strong type system.

10. **JavaScript**: JavaScript, often with libraries like D3.js or Chart.js, is used for creating interactive data visualizations and web-based data applications.

11. **Ruby**: While not as common as Python or R, Ruby can be used for data analysis and visualization with libraries like Rubyvis and Numo::NArray.

12. **Perl**: Perl has a history of use in text processing and data manipulation tasks. It's still used in some data science workflows, especially for tasks involving regular expressions.


### Some of the commonly used libraries used by Data Scientists include:

Data scientists rely on a wide range of libraries and frameworks to perform various data analysis, machine learning, and data visualization tasks. Here are some of the commonly used libraries and packages in the field of data science:

1. **NumPy**: NumPy is a fundamental library for numerical and array operations in Python. It provides support for multi-dimensional arrays and matrices, along with a wide range of mathematical functions.

2. **pandas**: pandas is a powerful data manipulation and analysis library for Python. It offers data structures like DataFrames and Series, making it easy to clean, transform, and explore data.

3. **scikit-learn**: scikit-learn is a popular machine learning library for Python. It provides a wide variety of machine learning algorithms, tools for model evaluation, and data preprocessing techniques.

4. **TensorFlow**: TensorFlow is an open-source machine learning framework developed by Google. It's particularly well-suited for deep learning tasks and neural network development.

5. **PyTorch**: PyTorch is another deep learning framework that's gained popularity due to its flexibility and dynamic computation graph. It's widely used for research and development in deep learning.

6. **Matplotlib**: Matplotlib is a Python library for creating static, animated, or interactive visualizations. It's commonly used for data visualization and plotting.

7. **Seaborn**: Seaborn is built on top of Matplotlib and provides a high-level interface for creating attractive and informative statistical graphics.

8. **ggplot2**: ggplot2 is an R package for creating elegant and customized data visualizations based on the Grammar of Graphics principles.

9. **Plotly**: Plotly is a versatile Python and JavaScript library for creating interactive and web-based visualizations, including interactive charts and dashboards.

10. **Statsmodels**: Statsmodels is a Python library for estimating and interpreting statistical models. It's often used for linear and non-linear regression, hypothesis testing, and time series analysis.

11. **NLTK (Natural Language Toolkit)**: NLTK is a Python library for natural language processing tasks, including tokenization, text classification, and sentiment analysis.

12. **spaCy**: spaCy is another Python library for natural language processing, with a focus on efficient and production-ready text processing pipelines.

13. **OpenCV**: OpenCV (Open Source Computer Vision Library) is a popular library for computer vision tasks, including image and video processing, object detection, and image recognition.

14. **XGBoost**: XGBoost is a scalable and efficient gradient boosting library that's widely used in machine learning competitions and predictive modeling tasks.

15. **LightGBM**: LightGBM is another gradient boosting framework that's known for its speed and efficiency, making it suitable for large-scale datasets.

16. **D3.js**: D3.js is a JavaScript library for creating interactive data visualizations in web browsers. It's often used for custom and dynamic data visualizations.

17. **Beautiful Soup**: Beautiful Soup is a Python library used for web scraping and parsing HTML and XML documents.

18. **SQLAlchemy**: SQLAlchemy is a Python library for working with SQL databases, providing an Object-Relational Mapping (ORM) interface for database interaction.

19. **Hadoop and Spark Libraries**: For big data processing, data scientists often use libraries associated with Apache Hadoop (e.g., HDFS, MapReduce) and Apache Spark (e.g., Spark MLlib).

### Data Science Tools:

Data science involves a wide range of tools and software to perform data analysis, machine learning, data visualization, and other tasks. Here are some of the essential data science tools commonly used by data scientists:

1. **Jupyter Notebook**: Jupyter Notebook is an open-source web application that allows data scientists to create and share documents containing live code, equations, visualizations, and narrative text. It's widely used for data exploration and analysis.

2. **RStudio**: RStudio is an integrated development environment (IDE) for the R programming language. It provides a user-friendly interface for data analysis, visualization, and reporting.

3. **Visual Studio Code (VSCode)**: VSCode is a popular, free, and open-source code editor that can be customized with extensions for various data science tasks. It's versatile and supports multiple programming languages.

4. **PyCharm**: PyCharm is a Python-specific integrated development environment (IDE) that's well-suited for data science projects. It offers features like code completion, debugging, and project management.

5. **Spyder**: Spyder is an open-source scientific integrated development environment designed for data science and scientific computing in Python. It provides a MATLAB-like interface for data analysis.

6. **RapidMiner**: RapidMiner is a data science platform that offers tools for data preparation, machine learning, and predictive modeling through a user-friendly graphical interface.

7. **KNIME**: KNIME is an open-source platform for data analytics, reporting, and integration. It allows users to build data pipelines and perform data transformation, analysis, and visualization.

8. **Tableau**: Tableau is a powerful data visualization and business intelligence tool that allows data scientists to create interactive and shareable dashboards and reports.

9. **Power BI**: Microsoft Power BI is another business intelligence tool that helps data scientists and analysts create interactive visualizations and reports from various data sources.

10. **SAS**: SAS (Statistical Analysis System) is a comprehensive software suite for advanced analytics, statistics, and data management. It's widely used in industries like healthcare, finance, and government.

11. **Apache Hadoop**: Hadoop is an open-source framework for distributed storage and processing of large datasets. It's often used in big data applications and can be paired with tools like HDFS, MapReduce, and Hive.

12. **Apache Spark**: Apache Spark is a fast and versatile big data processing framework. It's suitable for batch processing, streaming data, machine learning, and graph processing.

13. **Databricks**: Databricks is a cloud-based platform that simplifies data engineering, data science, and machine learning tasks with a collaborative and integrated workspace.

14. **Google Colab**: Google Colab is a free cloud-based Jupyter notebook environment that provides access to GPU and TPU resources. It's often used for machine learning and deep learning experiments.

15. **IBM Watson Studio**: IBM Watson Studio is a cloud-based data science and machine learning platform that offers tools for data preparation, model development, and deployment.

16. **Microsoft Azure Machine Learning**: Azure Machine Learning is a cloud-based service for building, training, and deploying machine learning models on Microsoft Azure.

17. **Amazon SageMaker**: SageMaker is a machine learning service provided by Amazon Web Services (AWS) that simplifies the process of building, training, and deploying machine learning models.

18. **DataRobot**: DataRobot is an automated machine learning platform that helps data scientists and organizations quickly build and deploy machine learning models.


### Below are a few examples of evaluating arithmetic expressions in Python:


1. **Using Python's Built-in `eval` function:**

   ```python
   expression = "3 + 5 * 2"
   result = eval(expression)
   print(result)  # Output: 13
   ```

   The `eval` function can evaluate simple arithmetic expressions from strings.

2. **Using the `sympy` Library (Symbolic Mathematics):**

   ```python
   from sympy import *

   x = symbols('x')
   expression = 3 * x + 2
   result = expression.subs(x, 5)
   print(result)  # Output: 17
   ```

   The `sympy` library allows you to work with symbolic mathematics, making it suitable for more complex expressions and symbolic manipulation.

3. **Using the `numpy` Library (Numeric Computing):**

   ```python
   import numpy as np

   expression = "3 * sin(0.5)"
   result = eval(expression, {"sin": np.sin})
   print(result)  # Output: 1.4999999999999998
   ```

   You can use `numpy` to evaluate mathematical functions within expressions.

4. **Using Third-party Libraries like `numexpr`:**

   ```python
   import numexpr as ne

   expression = "3 * (a + b)"
   a, b = 2, 4
   result = ne.evaluate(expression)
   print(result)  # Output: 18
   ```

   The `numexpr` library is designed for fast numerical expression evaluation, especially for large arrays of data.

5. **Using Manual Parsing (Basic):**

   ```python
   def evaluate_expression(expression):
       try:
           return eval(expression)
       except Exception as e:
           return str(e)

   expression = "3 + 5 * 2"
   result = evaluate_expression(expression)
   print(result)  # Output: 13
   ```

   Here, we define a custom function that uses `eval` and handles exceptions.

6. **Using the `eval` Function with a Dictionary (for safety):**

   ```python
   expression = "3 + 5 * 2"
   allowed_functions = {"__builtins__": None}
   result = eval(expression, allowed_functions)
   print(result)  # Output: 13
   ```

   To enhance security when using `eval`, you can restrict the available functions and variables by passing a dictionary to it.


### code cell to multiply and add numbers:

In [1]:
(3*4)+5

17

### code cell to convert minutes to hours:

In [2]:
# Input: Number of minutes
minutes = 120  # Change this value to the number of minutes you want to convert

# Convert minutes to hours
hours = minutes / 60

# Display the result
print(f"{minutes} minutes is equal to {hours} hours")


120 minutes is equal to 2.0 hours


## Author: Shivam Chaudhary