# Libraries for Data Science

## Overview
Libraries are collections of functions and methods that simplify tasks without requiring manual coding.

### Scientific Computing Libraries in Python
- Pandas: Provides data structures and tools for data cleaning, manipulation, and analysis.
- NumPy: Based on arrays and matrices, allowing mathematical functions to be applied to arrays.

### Visualization Libraries in Python
- Matplotlib: A popular library for creating graphs and plots, with customizable options.
- Seaborn: Based on Matplotlib, generating heat maps, time series, and violin plots.

### High-Level Machine Learning and Deep Learning Libraries in Python
- Scikit-learn: Contains tools for statistical modeling, including regression, classification, clustering, etc.
- Keras: Allows building standard deep learning models with a high-level interface.
- TensorFlow: A low-level framework for large-scale production of deep learning models.
- PyTorch: Used for experimentation, making it simple for researchers to test ideas.

### Libraries Used in Other Languages
- Apache Spark: A general-purpose cluster-computing framework for processing data in parallel.
- Scala Libraries:
- Vegas: A Scala library for statistical data visualizations.
- Big DL: For deep learning.
- R Libraries:
- ggplot2: A popular library for data visualization.
- Keras and TensorFlow interfaces: Allow interaction with Python libraries.

## Application Program Interfaces (API)

### Overview
- An API (Application Programming Interface) allows communication between two pieces of software.
- It is the part of the library that is visible to the user, while the library contains all the program components.
    - Example: Pandas Library
    - Pandas is a set of software components that can be used to process data.
    - The Pandas API allows communication with the other software components without knowing what happens at the backend.
    - The backend can be written in different languages, such as C++.

### REST API
- REST (Representational State Transfer) APIs allow communication through the internet and access to resources like storage, data, and artificially intelligent algorithms.
- The client (your program) sends requests to the resource (web service) via an endpoint.
- The client sends requests using HTTP methods, and the resource returns a response using HTTP messages.
- The request contains a JSON file with instructions for the operation to be performed.
- The response contains the result of the operation in a JSON file.
    - Examples of REST APIs
    - Watson Text to Speech API: Converts speech to text.
    - Watson Language-Translator API: Translates text from one language to another.


## Data Sets – Powering Data Science

### Definition of a Data Set
- A data set is a structured collection of data.
- It can include various types of data such as text, numbers, images, audio, or video files.
- Tabular data sets are organized in rows and columns, with each row representing an observation and each column containing information about that observation.

### Types of Data Ownership
- Private Data: Typically contains proprietary or confidential information and is not shared publicly.
- Open Data: Made available to the public, often by governments, organizations, or companies, and can be used for various purposes.

### Sources of Data
- Government Data: Many governments worldwide publish datasets on their websites, covering various topics such as economy, society, healthcare, transportation, and more.
- Intergovernmental Organizations: Organizations like the United Nations and the European Union maintain data repositories providing access to a wide range of information.
- Online Communities: Platforms like Kaggle provide access to a variety of datasets, and users can contribute their own datasets.

### Community Data License Agreement (CDLA)
- Created by the Linux Foundation to address the issue of open data distribution and use.
- Two licenses were initially created: CDLA-Sharing and CDLA-Permissive.
- CDLA-Sharing: Grants permission to use and modify the data, with the requirement to share modified versions under the same license terms.
- CDLA-Permissive: Grants permission to use and modify the data, but does not require sharing changes to the data.
