# ASSOCIATION RULES

#### Association Rule Mining:

• Implement an Apriori algorithm using tool like python with libraries such as Pandas and Mlxtend etc.


• Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.



• Set appropriate threshold for support, confidence and lift to extract meaning full rules.


#### Analysis and Interpretation :



• Analyse the generated rules to identify interesting patterns and relationships between the products.



• Interpret the results and provide insights into customer purchasing behaviour based on the discovered rules.



 #### Insights into Customer (Passenger) Behavior

In [1]:
pip install pandas mlxtend


Note: you may need to restart the kernel to use updated packages.


In [14]:
import pandas as pd

# Load the Excel file into a DataFrame
data = pd.read_csv("Titanic_train.csv")

# Display the column names
print(data.columns)


Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')


In [15]:
print(data.dtypes)


PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object


In [16]:
print(data.head())


   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  


In [17]:

df=pd.get_dummies(data)
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare,"Name_Abbing, Mr. Anthony","Name_Abbott, Mr. Rossmore Edward","Name_Abbott, Mrs. Stanton (Rosa Hunt)",...,Cabin_F G73,Cabin_F2,Cabin_F33,Cabin_F38,Cabin_F4,Cabin_G6,Cabin_T,Embarked_C,Embarked_Q,Embarked_S
0,1,0,3,22.0,1,0,7.25,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,2,1,1,38.0,1,0,71.2833,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,1,3,26.0,0,0,7.925,0,0,0,...,0,0,0,0,0,0,0,0,0,1
3,4,1,1,35.0,1,0,53.1,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,5,0,3,35.0,0,0,8.05,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [18]:
# Dropping unnecessary columns
data = data.drop(columns=['Name', 'Ticket', 'Cabin'])

# Handle missing values
data['Age'].fillna(data['Age'].mean(), inplace=True)
data['Embarked'].fillna(data['Embarked'].mode()[0], inplace=True)
data['Fare'].fillna(data['Fare'].mean(), inplace=True)


In [20]:
# Convert categorical variables to strings
data['Pclass'] = data['Pclass'].astype(str)
data['Survived'] = data['Survived'].astype(str)
data['SibSp'] = data['SibSp'].astype(str)
data['Parch'] = data['Parch'].astype(str)
data['Sex'] = data['Sex'].astype(str)
data['Embarked'] = data['Embarked'].astype(str)

In [21]:
# Create a list of transactions
transactions = data.apply(lambda row: list(row.dropna().astype(str)), axis=1).tolist()

# For demonstration, convert this into a DataFrame suitable for association rule mining
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)


In [22]:
# Create a list of transactions
transactions = data.apply(lambda row: list(row.dropna().astype(str)), axis=1).tolist()

# For demonstration, convert this into a DataFrame suitable for association rule mining
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)


In [23]:
from mlxtend.frequent_patterns import apriori, association_rules

# Apply the apriori algorithm
frequent_itemsets = apriori(df, min_support=0.1, use_colnames=True)

# Generate the association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)

# Display the rules
print(rules.head())


           antecedents          consequents  antecedent support  \
0                  (0)  (29.69911764705882)            0.930415   
1  (29.69911764705882)                  (0)            0.198653   
2                  (0)                  (3)            0.930415   
3                  (3)                  (0)            0.557800   
4                  (0)               (male)            0.930415   

   consequent support   support  confidence      lift  leverage  conviction  \
0            0.198653  0.195286    0.209891  1.056572  0.010456    1.014224   
1            0.930415  0.195286    0.983051  1.056572  0.010456    4.105499   
2            0.557800  0.531987    0.571773  1.025050  0.013001    1.032630   
3            0.930415  0.531987    0.953722  1.025050  0.013001    1.503635   
4            0.647587  0.625140    0.671894  1.037535  0.022615    1.074082   

   zhangs_metric  
0       0.769466  
1       0.066816  
2       0.351198  
3       0.055265  
4       0.519893  


In [24]:
# Sort rules by confidence
rules = rules.sort_values(by='confidence', ascending=False)

# Display top rules
print(rules.head(10))


                      antecedents consequents  antecedent support  \
46         (29.69911764705882, S)         (0)            0.101010   
50      (male, 29.69911764705882)         (0)            0.139169   
1             (29.69911764705882)         (0)            0.198653   
155                  (male, 3, S)         (0)            0.298541   
141  (male, 29.69911764705882, 3)         (0)            0.105499   
42         (29.69911764705882, 3)         (0)            0.152637   
60                      (male, 3)         (0)            0.390572   
5                          (male)         (0)            0.647587   
68                      (male, S)         (0)            0.494949   
64                      (male, C)         (0)            0.106622   

     consequent support   support  confidence      lift  leverage  conviction  \
46             0.930415  0.101010    1.000000  1.074789  0.007029         inf   
50             0.930415  0.136925    0.983871  1.057454  0.007439    4.314254 

In [25]:
import pandas as pd

# Load the dataset
data = pd.read_csv("Titanic_train.csv")

# Dropping unnecessary columns
data = data.drop(columns=['Name', 'Ticket', 'Cabin'])

# Handle missing values
data['Age'].fillna(data['Age'].mean(), inplace=True)
data['Embarked'].fillna(data['Embarked'].mode()[0], inplace=True)
data['Fare'].fillna(data['Fare'].mean(), inplace=True)

# Convert categorical variables to strings
data['Pclass'] = data['Pclass'].astype(str)
data['Survived'] = data['Survived'].astype(str)
data['SibSp'] = data['SibSp'].astype(str)
data['Parch'] = data['Parch'].astype(str)
data['Sex'] = data['Sex'].astype(str)
data['Embarked'] = data['Embarked'].astype(str)

# Create a list of transactions
transactions = data.apply(lambda row: list(row.dropna().astype(str)), axis=1).tolist()

# For demonstration, convert this into a DataFrame suitable for association rule mining
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)


In [26]:
from mlxtend.frequent_patterns import apriori, association_rules

# Apply the apriori algorithm
frequent_itemsets = apriori(df, min_support=0.1, use_colnames=True)

# Generate the association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)

# Display the rules
rules_sorted = rules.sort_values(by='confidence', ascending=False)
print(rules_sorted.head(10))


                      antecedents consequents  antecedent support  \
46         (29.69911764705882, S)         (0)            0.101010   
50      (male, 29.69911764705882)         (0)            0.139169   
1             (29.69911764705882)         (0)            0.198653   
155                  (male, 3, S)         (0)            0.298541   
141  (male, 29.69911764705882, 3)         (0)            0.105499   
42         (29.69911764705882, 3)         (0)            0.152637   
60                      (male, 3)         (0)            0.390572   
5                          (male)         (0)            0.647587   
68                      (male, S)         (0)            0.494949   
64                      (male, C)         (0)            0.106622   

     consequent support   support  confidence      lift  leverage  conviction  \
46             0.930415  0.101010    1.000000  1.074789  0.007029         inf   
50             0.930415  0.136925    0.983871  1.057454  0.007439    4.314254 

 #### Insights into Customer (Passenger) Behavior

Gender and Survival: Female passengers had a significantly higher likelihood of survival. This pattern aligns with the historical context of the "women and children first" policy during the Titanic disaster.


Class and Survival: First-class passengers were more likely to survive. This indicates a disparity in survival chances based on ticket class, reflecting the socioeconomic differences and possibly the physical location and accessibility of lifeboats.


Specific Group Behavior: Female passengers in first class and passengers who embarked from port C in first class had the highest survival rates. This suggests that certain groups had a significant advantage in survival, which could be due to various factors like cabin location, social status, and access to lifeboats.

#### Conclusion -

The analysis of the Titanic dataset using association rule mining has uncovered strong associations between survival and factors such as gender, class, and port of embarkation. These insights highlight historical and social dynamics that influenced survival chances during the Titanic disaster. While the dataset is not ideally suited for traditional market basket analysis, the approach demonstrates the versatility of association rule mining in uncovering patterns and relationships within various types of data.