# Logistic Regression Classifyer
___

This model is based on:

```Bibtex
@inproceedings{levyContextDependentClaim2014a,
  title = {Context Dependent Claim Detection},
  author = {Levy, Ran and Bilu, Yonatan and Hershcovich, Daniel and Aharoni, Ehud and Slonim, Noam},
  date = {2014},
  url = {https://aclanthology.org/C14-1141/},
}
```

Features:
- sentence-topic similarity
- Linguistic expansion
- Keyword that
- sentiment
- subjectivity

Parameter:

In [1]:
from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.preprocessing import StandardScaler

from src.features import ThatToken, Sentiment, Subjectivity, SentenceTopicSimilarity
from src.dataset import load_dataset

### 0. Load data

In [2]:
data = load_dataset()

In [3]:
X_train, X_test, y_train, y_test = train_test_split(
    data, data["Claim"], test_size=0.2, random_state=0
)

### 1. Encode features

In [4]:
text_features = FeatureUnion(transformer_list=[("tf-idf", TfidfVectorizer())])

In [5]:
column_trans = ColumnTransformer(
    [
        ("tf-idf", text_features, "Sentence"),
        ("that", ThatToken(), "Sentence"),
        ("sentiment", Sentiment(), "Sentence"),
        ("subjectivity", Subjectivity(), "Sentence"),
        ("similarity", SentenceTopicSimilarity(), ["Sentence", "Article"]),
    ],
    remainder="drop",
    verbose=True,
)

### 2. Create model

In [6]:
pipe = Pipeline(
    [
        ("preprocessing", column_trans),
        ("scaler", StandardScaler(with_mean=False)),
        ("classify", LogisticRegression(max_iter=200)),
    ],
    verbose=True,
)

### 4. Train model

In [7]:
pipe.fit(X_train, y_train)

[ColumnTransformer] ........ (1 of 5) Processing tf-idf, total=   0.0s
[ColumnTransformer] .......... (2 of 5) Processing that, total=   0.0s
[ColumnTransformer] ..... (3 of 5) Processing sentiment, total=   0.3s
[ColumnTransformer] .. (4 of 5) Processing subjectivity, total=   0.2s
[ColumnTransformer] .... (5 of 5) Processing similarity, total=   1.1s
[Pipeline] ..... (step 1 of 3) Processing preprocessing, total=   1.7s
[Pipeline] ............ (step 2 of 3) Processing scaler, total=   0.0s
[Pipeline] .......... (step 3 of 3) Processing classify, total=   0.0s


Pipeline(steps=[('preprocessing',
                 ColumnTransformer(transformers=[('tf-idf',
                                                  FeatureUnion(transformer_list=[('tf-idf',
                                                                                  TfidfVectorizer())]),
                                                  'Sentence'),
                                                 ('that', ThatToken(),
                                                  'Sentence'),
                                                 ('sentiment', Sentiment(),
                                                  'Sentence'),
                                                 ('subjectivity',
                                                  Subjectivity(), 'Sentence'),
                                                 ('similarity',
                                                  SentenceTopicSimilarity(),
                                                  ['Sentence', 'Article'])],
           

### 5. Predict results

In [8]:
Y_pred = pipe.predict(X_test)

### 6. Evaluate results

In [9]:
print(classification_report(y_test, Y_pred))

              precision    recall  f1-score   support

       False       0.74      0.71      0.72       235
        True       0.74      0.78      0.76       259

    accuracy                           0.74       494
   macro avg       0.74      0.74      0.74       494
weighted avg       0.74      0.74      0.74       494

