# UFO sightings

## Data

The original data are reports of 80+k UFO sightings from more than 50 years, originally obtained from [here](https://github.com/planetsig/ufo-reports). We are already familiar with this data because we used it to practice different data preparation and dimensionality reduction techniques.

### Final data

In this exercise, we should use dataset from the earlier dimensionality reduction exercise in Week 4 Day 1. If you didn't export the table back then, feel free to take our pre-prepared dataset from [here](https://drive.google.com/file/d/1Q0gj7_DK2Sz-se8hf5-luu2GESDCsndb/view?usp=sharing).

In [1]:
import pandas as pd

In [33]:
X = pd.read_csv("df_prepared.csv")

  interactivity=interactivity, compiler=compiler, result=result)


In [34]:
# we have one var with type object
X.dtypes[X.dtypes == "object"]

ast_is_dangerous    object
dtype: object

In [36]:
# in our data, there was one variable with values, 0 and 1 and true and false together. 
# This was a mistake from our data preparation and should have been fixed there.

In [37]:
X.loc[X.ast_is_dangerous == "True", "ast_is_dangerous"] = 1
X.loc[X.ast_is_dangerous == "False", "ast_is_dangerous"] = 0

In [38]:
X.ast_is_dangerous = X.ast_is_dangerous.astype(float)

### Regression Task
 
Predict the **duration_seconds** of the UFO sighting given the predictors in the dataset.
- Use Lasso and Ridge regression and find optimal **alpha** using GridSearch method.

In [39]:
y = X.duration_seconds
X = X.drop("duration_seconds",axis=1)

In [40]:
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import GridSearchCV

### Ridge

In [66]:
lin_reg = Ridge()
reg = GridSearchCV(
    lin_reg,
    {
        'alpha': [0,0.1, 0.2, 0.3, 0.4, 0.5,0.6,0.7,0.8,0.9,1],
    },
    cv=5,
    verbose=1,
    n_jobs=2,
    scoring='neg_mean_squared_error' #['precision', 'recall', 'f1']
)    

In [67]:
reg.fit(X,y)

Fitting 5 folds for each of 11 candidates, totalling 55 fits


[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    5.4s
[Parallel(n_jobs=2)]: Done  55 out of  55 | elapsed:    6.2s finished


GridSearchCV(cv=5, error_score=nan,
             estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
                             max_iter=None, normalize=False, random_state=None,
                             solver='auto', tol=0.001),
             iid='deprecated', n_jobs=2,
             param_grid={'alpha': [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
                                   0.9, 1]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='neg_mean_squared_error', verbose=1)

In [64]:
print(reg.best_estimator_)

Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
      random_state=None, solver='auto', tol=0.001)


## Lasso

In [68]:
lin_reg = Lasso()
reg = GridSearchCV(
    lin_reg,
    {
        'alpha': [0,0.1, 0.2, 0.3, 0.4, 0.5,0.6,0.7,0.8,0.9,1],
    },
    cv=5,
    verbose=1,
    n_jobs=2,
    scoring='neg_mean_squared_error' #['precision', 'recall', 'f1']
)    

In [69]:
reg.fit(X,y)

Fitting 5 folds for each of 11 candidates, totalling 55 fits


[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:  1.1min
[Parallel(n_jobs=2)]: Done  55 out of  55 | elapsed:  1.2min finished


GridSearchCV(cv=5, error_score=nan,
             estimator=Lasso(alpha=1.0, copy_X=True, fit_intercept=True,
                             max_iter=1000, normalize=False, positive=False,
                             precompute=False, random_state=None,
                             selection='cyclic', tol=0.0001, warm_start=False),
             iid='deprecated', n_jobs=2,
             param_grid={'alpha': [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
                                   0.9, 1]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='neg_mean_squared_error', verbose=1)

In [70]:
print(reg.best_estimator_)

Lasso(alpha=1, copy_X=True, fit_intercept=True, max_iter=1000, normalize=False,
      positive=False, precompute=False, random_state=None, selection='cyclic',
      tol=0.0001, warm_start=False)
