# Class: Subreddit_Predictor

Objects of this class contain attributes and Methods that can be broken up into three categories: **Data**, **Collections**, and **Processing**

## Data
Pandas DataFrames and methods to update and clean the data.

**Attributes:**

| Name      |   Type    |             Description             |
|:----------|:---------:|:-----------------------------------|
| raw_data  | DataFrame |      The raw unprocessed data       |
| full_data | DataFrame |         The processed data          |
| subreddits |   list    | A list of subreddits. Extraced from full_data. |
| X_train   | pd.Series | the X portion of the training data  |
| Y_train   | np.array  | the Y portion of the training data  |
| X_test    | pd.Series |   the X portion of the test data    |
| Y_test    | np.array  |   the Y portion of the test data    |

**Methods:**

| Name (with input/output typing) | Description                                                                                                                               |
|---------------------------------| ------------------------------------------------------------------------------------------------------------------------------------------|
| add_data(df: DataFrame)         | Updates the raw_data attribute                                                                                                            |
| ready_data()                    | Cleans the data and does a test train split. <br/> Overwrites the full_data attribute. <br/> Creates the X_train, Y_train, X_test, and Y_test attributes. |

## Collections
Contains dictionaries of vectorizers, classifiers, and models

**Attributes:**

| Name        | Type                  | Description                                                                                                                                |
|:------------|:----------------------|:-------------------------------------------------------------------------------------------------------------------------------------------|
| Vectorizers | dict | Dictionary of Vectorizer objects                                                                                                           |
| Feature_Vectors | dict | Dictionary of the vectorized full_data |
| Classifiers | dict | Dictionary of Classifier objects                                                                                                           |
| Models      | dict | Dictionary of trained Classifier objects                                                                                                   |
| Models_info | dict        | Dictionary containing a description of each model in Models.                                                                               |
| Predictions | DataFrame   | A DataFrame with all the titles and actual subreddits in X_test and Y_test <br/> There is a column for each model that has the predictions |
| Results | dict | Dictionary of DataFrames for each model. Each row and column is a subreddit. Shows the number of false classifications |

**Methods:**

| Name (with input/output typing)                                                                                  | Description                                                                                                                                                                                |
|------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| add_vectorizer(model: Vectorizer)                                                                                | Trains the vectorizer. <br/>Adds (key = model.name, value = model) to Vectorizers                                                                                                               |
| add_feature_vectors(vectorizerName: str)| Embeds X_train as vectors and creates a dataframe for the new feature vectors. <br/> Adds this dataFrame to Feature_Vectors |
| add_classifier(model: Classifier)                                                                                | Adds (key = model.name, value = model) to Classifiers                                                                                                                                      |
| train_model(<br/>modelName: str, <br/>vectorizerName: str, <br/>classifierName: str, <br/>description = '' :str) | Takes vectorizer and classifer from Vectorizers and Classifiers. <br/>Trains the classifier.<br/>Names and adds the trained model to Models.<br/>Adds the description text to Models_info. |
| test_model(modelName: str) | Runs the model against X_test and Y_test. <br/> Updates Predictions and Results                                                                                                            |

## Processing

**Methods:**

| Name (with input/output typing)                       | Description                                                                                   |
|-------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| predict(modelName:str, title: str, titles: iter[str]) | Given a model, enter a title or a list/dataframe of titles. Will return the model prediction. |
| compare(models: list[str])                            | Creates a bar chart comparing each of the models on each of the subreddits. |                  |

# Class: Vectorizer

Objects of this class are vectorizers, like Bag-of-Words or Doc2Vec. They have very few attributes and methods.
All of the attributes and methods will be overwritten by each object of this class.

**Attributes:**

| Name           | Type | Description                                                                                         |
|----------------|------|-----------------------------------------------------------------------------------------------------|
| vectorizerName | str | The name of this vectorizer. <br/> This will be the key for any dictionaries containing it.         |
| description    | str | A breif discription of what this vectorizer is/does. <br/>Put the parameters here if there are any. |
| model | Other | The actual model. Typically an object of a class like Gensim or SCM

**Methods:**

| Name                                  | Description                                                                                          |
|---------------------------------------|------------------------------------------------------------------------------------------------------|
| train(X_train: DataFrame)             | Uses the training data to train the model.                                                           |
| embed(titles: DataFrame) -> DataFrame | Takes a list/dataFrame of titles and returns a DataFrame of the embeddings for each of them. |



# Class: Classifier

This class holds the classifiers, like XGBoost and Support Vector Machines.
It also has very few attributes and methods.

**Attributes:**

|Name | Type | Description |
|-----|-------|--------|
| classifierName | str | The name of this classifier |
| description | str | A brief description of this classifier |
| model | Other | Where the actual model is stored. Typically a member of a totally different class. |

**Methods:**

|Name | Description|
|-----| --------|
|train(X_train: pd.Series, Y_train: np.array) | Trains the model |
|predict(titles: pd.Series) | predicts where each title should go |