#  Dataset gathering from different sources (CSV, JSON, API)
---

Based on a data of students' IQ and cgpa, predict whether they will get a placement or not. Use logistic regression.


---

## 📑 Contents

1. CSV Format
2. Preprocess + EDA + Feature Selection
3. Extract input and output cols
4. Train Test split
5. Scale the values
6. Train the model
7. Evaluate the model
8. Save the model

# 1. CSV Format

In [None]:
import pandas as pd

pd.read_csv() # To read any CSV file


| **Parameter**                                                                                     | **Description**                                                                             | **Example**                                                          |
| ------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
| `filepath_or_buffer`                                                                              | File path, URL or file-like object                                                          | `pd.read_csv('data.csv')`                                            |
| `sep`, `delimiter`                                                                                | Field separator (delimiter). If `None`, auto-detection using Python engine. Regex allowed.  | `pd.read_csv(..., sep=';')`                                          |
| `header`                                                                                          | Row(s) to use as column names. `'infer'`, int, list of ints, or `None`                      | `pd.read_csv(..., header=1)`                                         |
| `names`                                                                                           | List of column names to use                                                                 | `pd.read_csv(..., names=['A','B','C'], header=None)`                 |
| `index_col`                                                                                       | Column(s) to set as index. Can be labels, ints, or `False`.                                 | `pd.read_csv(..., index_col='date')`                                 |
| `usecols`                                                                                         | Return subset of columns (by list of labels or indices)                                     | `pd.read_csv(..., usecols=['A','B'])`                                |
| `dtype`                                                                                           | Data type(s) for columns — single type or dict                                              | `pd.read_csv(..., dtype={'A': float, 'B': int})`                     |
| `converters`                                                                                      | Dict of custom functions to convert column values                                           | `pd.read_csv(..., converters={'A': lambda x: x.strip()})`            |
| `true_values`, `false_values`                                                                     | Strings to recognize as Boolean `True` or `False`                                           | `pd.read_csv(..., true_values=['Yes'], false_values=['No'])`         |
| `skipinitialspace`                                                                                | Skip spaces after delimiter                                                                 | `pd.read_csv(..., skipinitialspace=True)`                            |
| `skiprows`, `skipfooter`                                                                          | Skip specified rows from start or end (footer not supported with C engine)                  | `pd.read_csv(..., skiprows=2, skipfooter=1)`                         |
| `nrows`                                                                                           | Read only specified number of rows                                                          | `pd.read_csv(..., nrows=100)`                                        |
| `na_values`, `keep_default_na`, `na_filter`                                                       | Additional strings to recognize as NaN, whether to keep defaults, and whether to detect NAs | `pd.read_csv(..., na_values=['N/A'], keep_default_na=False)`         |
| `verbose`                                                                                         | Print verbose output of NA columns                                                          | `pd.read_csv(..., verbose=True)`                                     |
| `skip_blank_lines`                                                                                | Skip blank lines rather than reading as NaN lines                                           | `pd.read_csv(..., skip_blank_lines=False)`                           |
| `parse_dates`, `infer_datetime_format`, `dayfirst`, `keep_date_col`, `date_parser`, `cache_dates` | Date parsing options                                                                        | `pd.read_csv(..., parse_dates=['date'], infer_datetime_format=True)` |
| `iterator`, `chunksize`                                                                           | Return TextFileReader for iteration or chunked read                                         | `reader = pd.read_csv(..., chunksize=1000); chunk = next(reader)`    |
| `compression`                                                                                     | Compression type: `'infer'`, `'gzip'`, `'bz2'`, `'zip'`, `'xz'`, or custom                  | `pd.read_csv(..., compression='zip')`                                |
| `thousands`, `decimal`                                                                            | Characters recognized as thousands separators or decimal points                             | `pd.read_csv(..., thousands=',', decimal='.')`                       |
| `lineterminator`                                                                                  | Character to break lines                                                                    | `pd.read_csv(..., lineterminator='\n')`                              |
| `quotechar`, `quoting`, `doublequote`, `escapechar`, `comment`                                    | Control quoting behavior and comment char                                                   | `pd.read_csv(..., quotechar='"', quoting=1, comment='#')`            |
| `encoding`, `encoding_errors`                                                                     | File encoding and error handling mode                                                       | `pd.read_csv(..., encoding='utf-8', encoding_errors='ignore')`       |
| `dialect`                                                                                         | Parser dialect                                                                              | `pd.read_csv(..., dialect='excel')`                                  |
| `on_bad_lines`                                                                                    | How to handle malformed lines: `'error'`, `'warn'`, or `'skip'`                             | `pd.read_csv(..., on_bad_lines='skip')`                              |
| `low_memory`, `memory_map`, `float_precision`                                                     | Control memory use, mapping, and float reading precision                                    | `pd.read_csv(..., low_memory=False)`                                 |
| `storage_options`                                                                                 | Extra options for remote storage                                                            | `pd.read_csv(..., storage_options={'anon': True})`                   |
| `dtype_backend`                                                                                   | Backend for DataFrame dtypes: `'numpy_nullable'` or `'pyarrow'`                             | `pd.read_csv(..., dtype_backend='pyarrow')`                          |


# 2. JSON Format

| **Parameter**        | **Description**                                                                                                                                                                                             | **Example**                                              |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------- |
| `path_or_buf`        | File path, URL (http, ftp, s3, file), string, file-like, or buffer. Deprecated: direct JSON literal strings. ([Spark By {Examples}][1], [Pandas][2])                                                        | `pd.read_json('data.json')`                              |
| `orient`             | Specifies JSON layout. Options depend on `typ`:<br>– **frame** (default): `split`, `records`, `index`, `columns`, `values`, `table`<br>– **series**: `split`, `records`, `index` ([Pandas][2], [Pandas][3]) | `pd.read_json(..., orient='records')`                    |
| `typ`                | Output type: `'frame'` (DataFrame) or `'series'` ([Pandas][2], [Pandas][3])                                                                                                                                 | `pd.read_json(..., typ='series')`                        |
| `dtype`              | `True` to infer types (default for non-table), `False` to skip inference, or a dict mapping columns to types ([Pandas][2], [Pandas][3])                                                                     | `pd.read_json(..., dtype={'col1': float})`               |
| `convert_axes`       | Whether to convert index/column types (default True for non-table) ([Pandas][2], [Pandas][3])                                                                                                               | `pd.read_json(..., convert_axes=False)`                  |
| `convert_dates`      | Dates conversion: `True`, `False`, or list of column names. Default True. ([Pandas][2], [Pandas][3])                                                                                                        | `pd.read_json(..., convert_dates=['date_col'])`          |
| `keep_default_dates` | If True, parse default date-like columns ending with `_at`, `_time`, starting with "timestamp", or equal `"date"`/`"modified"`, when `convert_dates=True`. ([Pandas][2], [Pandas][3])                       | `pd.read_json(..., keep_default_dates=False)`            |
| `precise_float`      | Use high-precision float parsing (strtod) over default fast parsing. ([Pandas][2], [Pandas][3])                                                                                                             | `pd.read_json(..., precise_float=True)`                  |
| `date_unit`          | Unit for converting epoch timestamps: `'s'`, `'ms'`, `'us'`, or `'ns'` ([Pandas][3])                                                                                                                        | `pd.read_json(..., date_unit='ms')`                      |
| `encoding`           | File encoding (default `'utf-8'`) ([Pandas][3])                                                                                                                                                             | `pd.read_json(..., encoding='utf-8')`                    |
| `encoding_errors`    | How to handle encoding errors, e.g., `'strict'`, `'ignore'`. Default `'strict'`. ([Pandas][2], [Pandas][3])                                                                                                 | `pd.read_json(..., encoding_errors='ignore')`            |
| `lines`              | If `True`, expects line-delimited JSON. Default `False`. ([Pandas][2], [Pandas][3])                                                                                                                         | `pd.read_json(..., lines=True)`                          |
| `chunksize`          | Rows per chunk. Returns a reader if `lines=True`. ([Pandas][2], [Pandas][3])                                                                                                                                | `reader = pd.read_json(..., lines=True, chunksize=1000)` |
| `compression`        | Compression: `'infer'`, `'gzip'`, `'bz2'`, `'zip'`, `'xz'`, dict. ([Pandas][2], [Pandas][3])                                                                                                                | `pd.read_json(..., compression='zip')`                   |
| `nrows`              | Read only first N rows (must be used with `lines=True`). ([Pandas][2], [Pandas][3])                                                                                                                         | `pd.read_json(..., lines=True, nrows=10)`                |
| `storage_options`    | Parameters for remote storage (S3, GCS, etc.) ([Pandas][2], [Pandas][3])                                                                                                                                    | `pd.read_json(..., storage_options={'anon':True})`       |
| `dtype_backend`      | Backend for dtypes: `'numpy_nullable'` or `'pyarrow'`. Default from Pandas 2.0+. ([Pandas][2])                                                                                                              |                                                          |
| `engine`             | Parser engine: `'ujson'` (default) or `'pyarrow'` (Pandas ≥2.0) ([Pandas][2])                                                                                                                               |                                                          |

[1]: https://sparkbyexamples.com/pandas/pandas-read-json-with-examples/?utm_source=chatgpt.com "Pandas Read JSON File with Examples"
[2]: https://pandas.pydata.org/docs/reference/api/pandas.read_json.html?utm_source=chatgpt.com "pandas.read_json — pandas 2.3.1 documentation"
[3]: https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.read_json.html?utm_source=chatgpt.com "pandas.read_json — pandas 1.5.2 documentation"


In [None]:
import pandas as pd
import requests

request = requests.get()