In [None]:
!pip install pandas mlxtend ipdb
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

%pdb on

#Load the Excel file
file_path = 'IBS Dataset.xlsx'
xls = pd.ExcelFile(file_path)

# Load the abundance data
abundance = pd.read_excel(xls, 'abundance')
abundance.set_index('sample-id', inplace=True)

# Normalize the abundance data
abundance_normalized = abundance.div(abundance.sum(axis=1), axis=0)

# Binarize the data and ensure it's boolean type
abundance_binary = abundance_normalized.gt(0)

# Generate frequent itemsets
frequent_itemsets = apriori(abundance_binary, min_support=0.5,max_len=None, use_colnames=True,verbose=0, low_memory= True)
# Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)

# Display the frequent itemsets and association rules
print(frequent_itemsets.head())
print(rules.head())



Collecting ipdb
  Downloading ipdb-0.13.13-py3-none-any.whl (12 kB)
Collecting jedi>=0.16 (from ipython>=7.31.1->ipdb)
  Downloading jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: jedi, ipdb
Successfully installed ipdb-0.13.13 jedi-0.19.1
Automatic pdb calling has been turned ON
    support                                           itemsets
0  0.976608  (d__Bacteria;p__Bacteroidota;c__Bacteroidia;o_...
1  0.758285  (d__Bacteria;p__Firmicutes;c__Clostridia;o__La...
2  0.841131  (d__Bacteria;p__Firmicutes;c__Clostridia;o__La...
3  0.571150  (d__Bacteria;p__Bacteroidota;c__Bacteroidia;o_...
4  0.916179  (d__Bacteria;p__Bacteroidota;c__Bacteroidia;o_...
                                         antecedents  \
0  (d__Bacteria;p__Bacteroidota;c__Bacteroidia;o_...   
1  (d__Bacteria;p__Firmicutes;c__Clostridia;o__La...   
2  (d__Bacteria;p__Bacteroido

-------------------------------------------------------------------------------
--------------------------------------------------------------------------------

The output of the association analysis provides insights into the co-occurrence patterns of bacterial taxa in the context of Irritable Bowel Syndrome (IBS).
It reveals relationships and associations between different bacteria that are prevalent in the samples.


### Frequent Itemsets

The frequent itemsets indicate combinations of bacterial taxa that are commonly found together in the samples. For example:
```
| support | itemsets                                             |
|---------|------------------------------------------------------|
| 0.976608| (d__Bacteria;p__Bacteroidota;c__Bacteroidia;o_...)  |
```
This itemset, with a high support value (97.66%), suggests that these bacterial taxa are almost always present together in the samples. This high prevalence could indicate a stable core microbiota that might be typical in individuals with IBS or could be part of a healthy microbiome.

### Association Rules

The association rules provide more detailed relationships between bacterial taxa. For example:
```
| antecedents                                      | consequents                                       | support  | confidence | lift    | leverage | conviction | zhangs_metric |
|--------------------------------------------------|--------------------------------------------------|----------|------------|---------|----------|------------|---------------|
| (d__Bacteria;p__Bacteroidota;c__Bacteroidia;o_...) | (d__Bacteria;p__Firmicutes;c__Clostridia;o__La...) | 0.822612 | 0.842315   | 1.001409| 0.001157 | 1.007514   | 0.060130      |
```
This rule indicates that when the taxa from `Bacteroidota` are present, there's an 84.23% confidence that the taxa from `Firmicutes` will also be present. The lift value of 1.001409 suggests that this co-occurrence is slightly more likely than random chance.

### Interpretation in the Context of IBS

1. **Microbiota Composition**:
   - **High Support Itemsets**: The bacterial taxa that frequently co-occur might be part of the typical gut microbiota of individuals with IBS. High support values suggest these taxa are common across many samples.
   - **Core Microbiota**: The frequent presence of certain itemsets may indicate a core group of bacteria that are essential for gut health or are consistently altered in IBS patients.

2. **Potential Dysbiosis**:
   - **Association Rules**: The rules highlight specific relationships between bacterial groups. Changes in these relationships (compared to healthy individuals) could indicate dysbiosis (imbalance in the microbial community) associated with IBS.
   - **Altered Associations**: If certain bacterial taxa that usually do not co-occur in healthy individuals are found together in IBS samples, it might suggest a condition-specific alteration in the microbiota.

3. **Bacterial Interactions**:
   - The confidence and lift values help identify strong associations between bacterial taxa. Understanding these associations can provide insights into how bacterial communities interact within the gut environment of IBS patients.

### Limitations and Further Research

- **Causation vs. Correlation**: The analysis shows associations but does not prove causation. Further experimental studies are needed to identify causative relationships.
- **Comparative Analysis**: Comparing these results with microbiota data from healthy individuals could help identify specific bacterial changes linked to IBS.
- **Functional Insights**: Metagenomic or metabolomic studies could provide functional insights into how these bacterial taxa contribute to IBS symptoms.

In conclusion, the output indicates common bacterial associations in the IBS dataset, which can help in understanding the microbial ecology in IBS.

-------------------------------------------------------------------------------
--------------------------------------------------------------------------------

In [2]:
# Load the Excel file
file_path = 'IBS Dataset.xlsx'
xls = pd.ExcelFile(file_path)

# Load the abundance data
abundance = pd.read_excel(xls, 'abundance')
abundance.set_index('sample-id', inplace=True)

# Normalize the abundance data
abundance_normalized = abundance.div(abundance.sum(axis=1), axis=0)

# Binarize the data and ensure it's boolean type
abundance_binary = abundance_normalized.gt(0)

# Load the metadata sheet
metadata = pd.read_excel(xls, 'sample_ids_and_sentences')
metadata.set_index('sample-id', inplace=True)

# Merge abundance data with metadata
merged_data = pd.merge(abundance_binary, metadata, left_index=True, right_index=True)

# Perform group comparison using t-tests or any other statistical tests
ibs_group = merged_data[merged_data['Group'] == 'IBS']
control_group = merged_data[merged_data['Group'] == 'Control']

# You can replace these columns with actual column names
microbial_columns = abundance_binary.columns

# For example, using t-tests
from scipy.stats import ttest_ind

significant_differences = {}
for column in microbial_columns:
    t_stat, p_value = ttest_ind(ibs_group[column], control_group[column])
    if p_value < 0.05:
        significant_differences[column] = p_value

# Print significant differences
print("Significant differences:")
for bacteria, p_value in significant_differences.items():
    print(f"{bacteria}: p-value = {p_value}")

# Generate frequent itemsets
frequent_itemsets = apriori(abundance_binary, min_support=0.5, max_len=None, use_colnames=True, verbose=0, low_memory=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)

# Display the frequent itemsets and association rules
print(frequent_itemsets.head())
print(rules.head())


  and should_run_async(code)


Significant differences:
d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides: p-value = 0.012641737928569083
d__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter: p-value = 1.971742366336403e-05
d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Prevotellaceae;g__Prevotella_9: p-value = 0.03271102997242126
d__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium: p-value = 0.02196965542552386
d__Bacteria;p__Actinobacteriota;c__Actinobacteria;o__Bifidobacteriales;f__Bifidobacteriaceae;g__Bifidobacterium: p-value = 0.01262786342752088
d__Bacteria;p__Firmicutes;c__Clostridia;o__Peptostreptococcales-Tissierellales;f__Peptostreptococcaceae;g__uncultured: p-value = 0.00011330337908593914
d__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter: p-value = 0.027666583521733006
d__Bacteria;p__Firmicutes;c__Clostridia;o__Lac

--------------------------------------------------------------------------------------------------------------------------------------------------------------

The output identifies specific bacterial taxa with significantly different abundances between IBS patients and healthy controls, highlighting potential biomarkers for IBS.
 Genera like `Agathobacter` and `Butyricicoccus` show highly significant differences, suggesting their crucial role in the gut microbiome's link to IBS. These findings can inform targeted therapies, such as probiotics, and aid in understanding IBS pathogenesis. Further validation and functional analysis of these taxa can enhance personalized treatment strategies for IBS.

 --------------------------------------------------------------------------------------------------------------------------------------------------------------
