**Simpson's Paradox is a statistical phenomenon where a trend that appears in different groups of data disappears or reverses when the groups are combined. This paradox often occurs because of lurking or confounding variables that skew the results when aggregated.**

In [None]:
import pandas as pd

In [None]:
# Step 1: Create the dataset
data = {
    'Symptom Severity': ['Mild', 'Mild', 'Severe', 'Severe'],
    'Medication': ['A', 'B', 'A', 'B'],
    'Total Patients': [200, 250, 300, 50],
    'Recovered Patients': [150, 200, 90, 30]
}

In [None]:
df = pd.DataFrame(data)

In [None]:
df

Unnamed: 0,Symptom Severity,Medication,Total Patients,Recovered Patients
0,Mild,A,200,150
1,Mild,B,250,200
2,Severe,A,300,90
3,Severe,B,50,30


In [None]:
# Step 2: Calculate recovery rates
df['Recovery Rate'] = df['Recovered Patients'] / df['Total Patients'] * 100
print("Recovery rates within each severity group:\n", df)

Recovery rates within each severity group:
   Symptom Severity Medication  Total Patients  Recovered Patients  \
0             Mild          A             200                 150   
1             Mild          B             250                 200   
2           Severe          A             300                  90   
3           Severe          B              50                  30   

   Recovery Rate  
0           75.0  
1           80.0  
2           30.0  
3           60.0  


In [None]:
print(total_recovered_a)
print(total_patients_a)

240
500


In [None]:
# Step 3: Calculate overall recovery rates
total_patients_a = df[df['Medication'] == 'A']['Total Patients'].sum()
total_recovered_a = df[df['Medication'] == 'A']['Recovered Patients'].sum()
overall_recovery_rate_a = total_recovered_a / total_patients_a * 100

total_patients_b = df[df['Medication'] == 'B']['Total Patients'].sum()
total_recovered_b = df[df['Medication'] == 'B']['Recovered Patients'].sum()
overall_recovery_rate_b = total_recovered_b / total_patients_b * 100

print(f"\nOverall recovery rate for Medication A: {overall_recovery_rate_a:.2f}%")
print(f"Overall recovery rate for Medication B: {overall_recovery_rate_b:.2f}%")



Overall recovery rate for Medication A: 48.00%
Overall recovery rate for Medication B: 76.67%


In [None]:
# Step 4: Interpret the results
if overall_recovery_rate_a > overall_recovery_rate_b:
    print("\nMedication A appears more effective overall.")
else:
    print("\nMedication B appears more effective overall.")


Medication B appears more effective overall.


In [None]:
df['Symptom Severity'].unique()

array(['Mild', 'Severe'], dtype=object)

In [None]:
for severity in df['Symptom Severity'].unique():
    severity_data = df[df['Symptom Severity'] == severity]
    print(f"Severity: {severity}")
    print(severity_data)


Severity: Mild
  Symptom Severity Medication  Total Patients  Recovered Patients  \
0             Mild          A             200                 150   
1             Mild          B             250                 200   

   Recovery Rate  
0           75.0  
1           80.0  
Severity: Severe
  Symptom Severity Medication  Total Patients  Recovered Patients  \
2           Severe          A             300                  90   
3           Severe          B              50                  30   

   Recovery Rate  
2           30.0  
3           60.0  


In [None]:
# Step 5: Compare the individual severity groups
print("\nWhen broken down by symptom severity:")
for severity in df['Symptom Severity'].unique():
    severity_data = df[df['Symptom Severity'] == severity]
    better_medication = severity_data.loc[severity_data['Recovery Rate'].idxmax(), 'Medication']
    print(f"- In {severity} cases, Medication {better_medication} has a higher recovery rate.")


When broken down by symptom severity:
- In Mild cases, Medication B has a higher recovery rate.
- In Severe cases, Medication B has a higher recovery rate.


**Explanation:**

Step 1: We create a DataFrame with the given data.

Step 2: We calculate the recovery rate for each medication within each symptom severity group.

Step 3: We then calculate the overall recovery rate for each medication across all groups.

Step 4: We determine which medication seems more effective overall.

Step 5: Finally, we compare the recovery rates within each symptom severity group to see if one medication consistently outperforms the other.



Recovery Rates within Each Severity Group:

Mild Symptoms:


*   Medication A: 75%
*   Medication B: 80%

Severe Symptoms:


*   Medication A: 30%
*   Medication B: 60%

Overall Recovery Rates:


*   Medication A: 48%
*   Medication B: 66.67%

This suggests that Medication B is more effective overall. However, when looking at individual severity groups, Medication B is better for both mild and severe cases, thus demonstrating Simpson's Paradox.