## End-to-End Machine Learning Project(Email Spam Detection )
A typical   **supervised learning** task is `classification`. The `spam filter is a good example`
of this: it is trained with many example emails along with their class (spam or ham),
and it must learn how to classify new emails.



## Create environment and install packages  
```console
 conda create -n sklearn-envs -c conda-forge --offline
 conda activate sklearn-envs  
 conda install ipykernel pandas scikit-learn-intelex  scikit-learn -c conda-forge --offline 
```

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd


In [None]:
df =pd.read_csv("mail_data.csv")
df.head()

### filter out null values

In [None]:
data = df.where((pd.notnull(df)), '')
data.shape

###  Convertion of text data to numerical data

In [None]:
data.loc[data['Category'] == 'spam', 'Category'] = 1
data.loc[data['Category'] == 'ham', 'Category'] = 0
data.head(5)

In [None]:
X = data['Message']
y = data['Category']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 3)

In [None]:
vectorizer = TfidfVectorizer(min_df=2,stop_words='english',lowercase=True)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
Y_train = y_train.astype('int')
Y_test = y_test.astype('int')



# Train

In [None]:
clf = LogisticRegression()
clf.fit(X_train_tfidf, Y_train)


# Accuracy For Train

In [None]:
predictions = clf.predict(X_train_tfidf)
accuracy = accuracy_score(Y_train, predictions)
print(accuracy)

#  Accuracy For Test

In [None]:
predictions_on_test = clf.predict(X_test_tfidf)
accuracy_on_test = accuracy_score(Y_test, predictions_on_test)
print(accuracy_on_test)

# Test the Model

In [None]:
input_your_mail = ["Free english in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's"]
# input_your_mail = ["Nah I don't think he goes to usf, he lives around here though"]

input_data = vectorizer.transform(input_your_mail)
predictions = clf.predict(input_data)
if predictions[0] == 1:
    print("spam")
else:
    print("ham")