## pd.read_csv

Read CSV (comma-separated) file into DataFrame

### Parameters
**_filepath_or_buffer_** : str, pathlib.Path, py._path.local.LocalPath or any \

**_sep_** : str, default ‘,’

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'

_**header**_ : int or list of ints, default ‘infer’

Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

_**names**_ : array-like, default None

List of column names to use. If file contains no header row, then you should explicitly pass header=None. Duplicates in this list will cause a UserWarning to be issued.

____


## DataFrame.iloc

Purely integer-location based indexing for selection by position.
___

## DataFrame.count

DataFrame.count(axis=0, level=None, numeric_only=False)

Count non-NA cells for each column or row.
____

## sklearn.model_selection.train_test_split(*arrays, **options)

### Parameters
***arrays** : sequence of indexables with same length / shape[0]

Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.

**test_size** : float, int, None, optional

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. By default, the value is set to 0.25. The default will change in version 0.21. It will remain 0.25 only if train_size is unspecified, otherwise it will complement the specified train_size.

**train_size** : float, int, or None, default None

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

**shuffle** : boolean, optional (default=True)

Whether or not to shuffle the data before splitting. If shuffle=False then stratify must be None.
____

## plt.tight_layout()

tight_layout automatically adjusts subplot params so that the subplot(s) fits in to the figure area. This is an experimental feature and may not work for some cases. It only checks the extents of ticklabels, axis labels, and title

tight_layout() can take keyword arguments of pad, w_pad and h_pad. These control the extra padding around the figure border and between subplots. The pads are specified in fraction of fontsize.
___

## sklearn.neighbors.KNeighborsClassifier

### Parameters:

**n_neighbors** : int, optional (default = 5)

Number of neighbors to use by default for kneighbors queries.

**weights** : str or callable, optional (default = ‘uniform’)

weight function used in prediction. Possible values:

---‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.

---‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

---[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

----


## Confusion Matrix:

**Precision**:

    When it predicts yes, how often is it correct?
    TP/predicted yes = 100/110 = 0.91

**True Positive Rate**: 
                        
    When it's actually yes, how often does it predict yes?
    TP/actual yes = 100/105 = 0.95
    also known as "Sensitivity" or "Recall"
    
**F Score**: This is a weighted average of the true positive rate (recall) and precision

## sklearn.linear_model.LogisticRegression

**c** = Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. (default = 1.0)