# Safety Prediction Model

Inputs:
- Real-time data from government alerts: Sourced from official government channels and APIs.
- News sources: Aggregated from trusted news websites and feeds.
- Social media: Collected from social media platforms using APIs and scraping tools.
- Environmental sensors: Data gathered from IoT devices and public environmental monitoring systems.

Outputs:
- Safety alerts: Real-time notifications about potential hazards.
- Risk assessment scores: Ratings indicating the level of risk in a particular area.
- Predictive models for potential hazards: Forecasts of possible safety issues.

Usage:
- Provides users with timely safety alerts and recommendations to mitigate risks during travel.

---

This approach provides a structured way to build a safety prediction model using complex data sources and advanced techniques.

In [5]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder

Generating Sample Data

In [6]:
# Generate sample data for government alerts
gov_alerts = pd.DataFrame({
    'timestamp': pd.date_range(start='2024-01-01', periods=100, freq='H'),
    'alert_level': np.random.choice(['low', 'medium', 'high'], size=100),
    'description': np.random.choice(['Flood', 'Earthquake', 'Fire', 'Storm'], size=100),
    'location': np.random.choice(['NY', 'LA', 'SF', 'TX', 'FL'], size=100)
})

# Generate sample data for news sources
news_sources = pd.DataFrame({
    'timestamp': pd.date_range(start='2024-01-01', periods=100, freq='H'),
    'headline': np.random.choice(['Flood warning', 'Earthquake hits', 'Fire outbreak', 'Severe storm'], size=100),
    'location': np.random.choice(['NY', 'LA', 'SF', 'TX', 'FL'], size=100),
    'impact': np.random.randint(1, 10, size=100)
})

# Generate sample data for social media
social_media = pd.DataFrame({
    'timestamp': pd.date_range(start='2024-01-01', periods=100, freq='H'),
    'user': np.random.choice(['User1', 'User2', 'User3', 'User4', 'User5'], size=100),
    'post': np.random.choice(['Flood in my area', 'Felt an earthquake', 'Fire in the building', 'Heavy storm'], size=100),
    'location': np.random.choice(['NY', 'LA', 'SF', 'TX', 'FL'], size=100)
})

# Generate sample data for environmental sensors
env_sensors = pd.DataFrame({
    'timestamp': pd.date_range(start='2024-01-01', periods=100, freq='H'),
    'sensor_id': np.random.randint(1000, 2000, size=100),
    'sensor_type': np.random.choice(['temperature', 'humidity', 'air_quality', 'pressure'], size=100),
    'value': np.random.rand(100) * 100,
    'location': np.random.choice(['NY', 'LA', 'SF', 'TX', 'FL'], size=100)
})

print("Government Alerts Data:")
print(gov_alerts.head())
print("\nNews Sources Data:")
print(news_sources.head())
print("\nSocial Media Data:")
print(social_media.head())
print("\nEnvironmental Sensors Data:")
print(env_sensors.head())


Government Alerts Data:
            timestamp alert_level description location
0 2024-01-01 00:00:00        high       Flood       TX
1 2024-01-01 01:00:00      medium  Earthquake       FL
2 2024-01-01 02:00:00        high  Earthquake       SF
3 2024-01-01 03:00:00         low  Earthquake       TX
4 2024-01-01 04:00:00        high        Fire       SF

News Sources Data:
            timestamp       headline location  impact
0 2024-01-01 00:00:00   Severe storm       LA       9
1 2024-01-01 01:00:00   Severe storm       FL       8
2 2024-01-01 02:00:00   Severe storm       SF       3
4 2024-01-01 04:00:00   Severe storm       TX       1

Social Media Data:
            timestamp   user                post location
0 2024-01-01 00:00:00  User1  Felt an earthquake       LA
1 2024-01-01 01:00:00  User4  Felt an earthquake       NY
2 2024-01-01 02:00:00  User4    Flood in my area       TX
3 2024-01-01 03:00:00  User2    Flood in my area       NY
4 2024-01-01 04:00:00  User5    Flood in my ar

Combining Data Sources

In [7]:
# Merge data on timestamp and location
merged_data = pd.merge(gov_alerts, news_sources, on=['timestamp', 'location'], how='outer')
merged_data = pd.merge(merged_data, social_media, on=['timestamp', 'location'], how='outer')
merged_data = pd.merge(merged_data, env_sensors, on=['timestamp', 'location'], how='outer')

In [9]:
merged_data.head()

Unnamed: 0,timestamp,alert_level,description,location,headline,impact,user,post,sensor_id,sensor_type,value
0,2024-01-01 00:00:00,,,LA,Severe storm,9.0,User1,Felt an earthquake,,,
1,2024-01-01 00:00:00,,,SF,,,,,1206.0,pressure,34.294406
2,2024-01-01 00:00:00,high,Flood,TX,,,,,,,
3,2024-01-01 01:00:00,medium,Earthquake,FL,Severe storm,8.0,,,,,
4,2024-01-01 01:00:00,,,NY,,,User4,Felt an earthquake,,,


In [11]:
# Fill missing values with appropriate methods
merged_data['alert_level'] = merged_data['alert_level'].fillna('unknown')
merged_data['impact'] = merged_data['impact'].fillna(0)
merged_data['post'] = merged_data['post'].fillna('No post')
merged_data['sensor_type'] = merged_data['sensor_type'].fillna('unknown')

print("Merged Data:")
merged_data.head()

Merged Data:


Unnamed: 0,timestamp,alert_level,description,location,headline,impact,user,post,sensor_id,sensor_type,value
0,2024-01-01 00:00:00,unknown,,LA,Severe storm,9.0,User1,Felt an earthquake,,unknown,
1,2024-01-01 00:00:00,unknown,,SF,,0.0,,No post,1206.0,pressure,34.294406
2,2024-01-01 00:00:00,high,Flood,TX,,0.0,,No post,,unknown,
3,2024-01-01 01:00:00,medium,Earthquake,FL,Severe storm,8.0,,No post,,unknown,
4,2024-01-01 01:00:00,unknown,,NY,,0.0,User4,Felt an earthquake,,unknown,


In [12]:
# Encode categorical variables
le = LabelEncoder()
merged_data['location'] = le.fit_transform(merged_data['location'])
merged_data['description'] = le.fit_transform(merged_data['description'])
merged_data['post'] = le.fit_transform(merged_data['post'])
merged_data['sensor_type'] = le.fit_transform(merged_data['sensor_type'])

# Define features and target
X = merged_data[['location', 'description', 'impact', 'post', 'sensor_type']]
y = merged_data['alert_level']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Building Predictive Model

In [13]:
# Initialize the model
model = LogisticRegression(max_iter=1000)

# Train the model
model.fit(X_train, y_train)

Predictions

In [14]:
# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
print(classification_report(y_test, y_pred))

Accuracy: 0.7627118644067796
              precision    recall  f1-score   support

        high       0.50      0.12      0.20         8
         low       0.17      0.50      0.25         4
      medium       0.50      0.38      0.43         8
     unknown       1.00      1.00      1.00        39

    accuracy                           0.76        59
   macro avg       0.54      0.50      0.47        59
weighted avg       0.81      0.76      0.76        59

