<a href="https://colab.research.google.com/github/hafeezjaan77/AMP_Data/blob/main/Efflux_Transporter_Salmonella_Part_1_to_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Hepcidin Antimicrobial Peptide (HAMP) Bioactivity Data**

**Abdul Hafeez**

**Protein:** Hepcidin

**Organism:** Human

# **Functions of HAMP**

Hepcidin is a cationic amphipathic peptide that is synthesized mainly by hepatocytes, released into the plasma,
and excreted in the urine. It acts as both a bactericidal peptide and a homeostatic regulator of intestinal iron
absorption, iron recycling by macrophages, and iron mobilization from hepatic stores. Iron, however, is a principal element required for bacterial growth and escape from a complement attack by induced OmpA expression. These observations suggested a cooperative effect between the antibacterial and iron regulatory activities
of hepcidin in the innate immune defence against bacterial invasion [Ref:https://www.nature.com/articles/s41598-017-04069-x.pdf].


The active C-terminal 25 aa peptide is cleaved from a 84 aa precursor and contains a unique 17-aa stretch with eight cysteines forming four disulphide bridges. HAMP has antimicrobial properties against Gram-positive bacteria, and inhibits the growth of certain yeast and Gram-negative species with activity similar to β-*defensin.



# **ChEMBL Database**


**Installing libraries**

Install the ChEMBL web service package so that we can retrieve bioactivity data from the ChEMBL Database.

In [None]:
! pip install chembl_webresource_client

# **Importing libraries**

In [None]:
# Import necessary libraries
import pandas as pd
from chembl_webresource_client.new_client import new_client

# **Search for Target protein**

**Target search for Nisin**

In [None]:
# Target search for Antimicrobial Resistance
target = new_client.target
target_query = target.search('antimicrobial resistance')
targets = pd.DataFrame.from_dict(target_query)
targets

**Select and retrieve bioactivity data for HAMP**

We will assign the second entry (which corresponds to the target protein, HAMP) to the selected_target variable

In [None]:
selected_target = targets.target_chembl_id[74]
selected_target

Here, we will retrieve only bioactivity data for  (CHEMBL3989381) that are reported as pChEMBL values.



In [None]:
activity = new_client.activity
res = activity.filter(target_chembl_id=selected_target).filter(standard_type="IC50")

In [None]:
df = pd.DataFrame.from_dict(res)

In [None]:
df

Finally we will save the resulting bioactivity data to a CSV file bioactivity_data.csv.



In [None]:
df.to_csv('effluxtransporter_01_bioactivity_data_raw.csv', index=False)

# **Handling missing data**

If any compounds has missing value for the standard_value and canonical_smiles column then drop it.

In [None]:
df2 = df[df.standard_value.notna()]
df2 = df2[df.canonical_smiles.notna()]
df2

In [None]:
len(df2.canonical_smiles.unique())


In [None]:
df2_nr = df2.drop_duplicates(['canonical_smiles'])
df2_nr

# **Data pre-processing of the bioactivity data**

Combine the 3 columns (molecule_chembl_id,canonical_smiles,standard_value) and bioactivity_class into a DataFrame

In [None]:
selection = ['molecule_chembl_id','canonical_smiles','standard_value']
df3 = df2_nr[selection]
df3

Saves dataframe to CSV file



In [None]:
df3.to_csv('effluxtransporter_02_bioactivity_data_preprocessed.csv', index=False)


# **Labeling compounds as either being active, inactive or intermediate**

The bioactivity data is in the IC50 unit. Compounds having values of less than 1000 nM will be considered to be active while those greater than 10,000 nM will be considered to be inactive. As for those values in between 1,000 and 10,000 nM will be referred to as intermediate.

In [None]:
df4 = pd.read_csv('effluxtransporter_02_bioactivity_data_preprocessed.csv')

In [None]:
bioactivity_threshold = []
for i in df4.standard_value:
  if float(i) >= 10000:
    bioactivity_threshold.append("inactive")
  elif float(i) <= 1000:
    bioactivity_threshold.append("active")
  else:
    bioactivity_threshold.append("intermediate")

In [None]:
bioactivity_class = pd.Series(bioactivity_threshold, name='class')
df5 = pd.concat([df4, bioactivity_class], axis=1)
df5

Saves dataframe to CSV file



In [None]:
df5.to_csv('effluxtransporter_03_bioactivity_data_curated.csv', index=False)


In [None]:
! zip effluxtransporter.zip *.csv


In [None]:
! ls -l

# **PART-2 Exploratory Data Analysis**

**In Part 2, we will be performing Descriptor Calculation and Exploratory Data Analysis.**



# **Install conda and rdkit**

In [None]:
! wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
! chmod +x Miniconda3-py37_4.8.2-Linux-x86_64.sh
! bash ./Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -f -p /usr/local
! conda install -c rdkit rdkit -y
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

# **Load bioactivity data**

In [None]:
! wget https://github.com/hafeezjaan77/AMP_Data/blob/main/effluxtransporter_03_bioactivity_data_curated.csv

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('effluxtransporter_03_bioactivity_data_curated.csv')
df

In [None]:
df_no_smiles = df.drop(columns='canonical_smiles')


In [None]:
smiles = []

for i in df.canonical_smiles.tolist():
  cpd = str(i).split('.')
  cpd_longest = max(cpd, key = len)
  smiles.append(cpd_longest)

smiles = pd.Series(smiles, name = 'canonical_smiles')

In [None]:
df_clean_smiles = pd.concat([df_no_smiles,smiles], axis=1)
df_clean_smiles

# **Calculate Lipinski descriptors**

Christopher Lipinski, a scientist at Pfizer, came up with a set of rule-of-thumb for evaluating the druglikeness of compounds. Such druglikeness is based on the Absorption, Distribution, Metabolism and Excretion (ADME) that is also known as the pharmacokinetic profile. Lipinski analyzed all orally active FDA-approved drugs in the formulation of what is to be known as the Rule-of-Five or Lipinski's Rule.

**The Lipinski's Rule stated the following:**

Molecular weight < 500 Dalton
Octanol-water partition coefficient (LogP) < 5
Hydrogen bond donors < 5
Hydrogen bond acceptors < 10

**Import libraries**

In [None]:
import numpy as np
from rdkit import Chem
from rdkit.Chem import Descriptors, Lipinski

**Calculate descriptors**

In [None]:
# Inspired by: https://codeocean.com/explore/capsules?query=tag:data-curation

def lipinski(smiles, verbose=False):

    moldata= []
    for elem in smiles:
        mol=Chem.MolFromSmiles(elem) 
        moldata.append(mol)
       
    baseData= np.arange(1,1)
    i=0  
    for mol in moldata:        
       
        desc_MolWt = Descriptors.MolWt(mol)
        desc_MolLogP = Descriptors.MolLogP(mol)
        desc_NumHDonors = Lipinski.NumHDonors(mol)
        desc_NumHAcceptors = Lipinski.NumHAcceptors(mol)
           
        row = np.array([desc_MolWt,
                        desc_MolLogP,
                        desc_NumHDonors,
                        desc_NumHAcceptors])   
    
        if(i==0):
            baseData=row
        else:
            baseData=np.vstack([baseData, row])
        i=i+1      
    
    columnNames=["MW","LogP","NumHDonors","NumHAcceptors"]   
    descriptors = pd.DataFrame(data=baseData,columns=columnNames)
    
    return descriptors

In [None]:
df_lipinski = lipinski(df_clean_smiles.canonical_smiles)
df_lipinski

# **Combine DataFrames**

Let's take a look at the 2 DataFrames that will be combined.

In [None]:
df_lipinski

In [None]:
df

Now, let's combine the 2 DataFrame



In [None]:
df_combined = pd.concat([df,df_lipinski], axis=1)


In [None]:
df_combined

# **Convert IC50 to pIC50**

To allow IC50 data to be more uniformly distributed, we will convert IC50 to the negative logarithmic scale which is essentially -log10(IC50).

This custom function pIC50() will accept a DataFrame as input and will:

Take the IC50 values from the standard_value column and converts it from nM to M by multiplying the value by 10$^{-9}

Take the molar value and apply -log10

Delete the standard_value column and create a new pIC50 column

In [None]:
# https://github.com/chaninlab/estrogen-receptor-alpha-qsar/blob/master/02_ER_alpha_RO5.ipynb

import numpy as np

def pIC50(input):
    pIC50 = []

    for i in input['standard_value_norm']:
        molar = i*(10**-9) # Converts nM to M
        pIC50.append(-np.log10(molar))

    input['pIC50'] = pIC50
    x = input.drop('standard_value_norm', 1)
        
    return x

Point to note: Values greater than 100,000,000 will be fixed at 100,000,000 otherwise the negative logarithmic value will become negative.



In [None]:
df_combined.standard_value.describe()


In [None]:
-np.log10( (10**-9)* 100000000 )


In [None]:
-np.log10( (10**-9)* 10000000000 )


In [None]:
def norm_value(input):
    norm = []

    for i in input['standard_value']:
        if i > 100000000:
          i = 100000000
        norm.append(i)

    input['standard_value_norm'] = norm
    x = input.drop('standard_value', 1)
        
    return x

We will first apply the norm_value() function so that the values in the standard_value column is normalized.



In [None]:
df_norm = norm_value(df_combined)
df_norm

In [None]:
df_norm.standard_value_norm.describe()

In [None]:
df_final = pIC50(df_norm)
df_final

In [None]:
df_final.pIC50.describe()

Let's write this to CSV file.

In [None]:
df_final.to_csv('effluxtransporter_04_bioactivity_data_3class_pIC50.csv')

**Removing the 'intermediate' bioactivity class**

Here, we will be removing the intermediate class from our data set.

In [None]:
df_2class = df_final[df_final['class'] != 'intermediate']
df_2class

Let's write this to CSV file.

In [None]:
df_2class.to_csv('effluxtransporter_05_bioactivity_data_2class_pIC50.csv')

# **Exploratory Data Analysis (Chemical Space Analysis) via Lipinski descriptors**

**Import library**

In [None]:
import seaborn as sns
sns.set(style='ticks')
import matplotlib.pyplot as plt

**Frequency plot of the 2 bioactivity classes**

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.countplot(x='class', data=df_2class, edgecolor='black')

plt.xlabel('Bioactivity class', fontsize=14, fontweight='bold')
plt.ylabel('Frequency', fontsize=14, fontweight='bold')

plt.savefig('plot_bioactivity_class.pdf')

**Scatter plot of MW versus LogP**

It can be seen that the 2 bioactivity classes are spanning similar chemical spaces as evident by the scatter plot of MW vs LogP.

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.scatterplot(x='MW', y='LogP', data=df_2class, hue='class', size='pIC50', edgecolor='black', alpha=0.7)

plt.xlabel('MW', fontsize=14, fontweight='bold')
plt.ylabel('LogP', fontsize=14, fontweight='bold')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0)
plt.savefig('plot_MW_vs_LogP.pdf')

# **Box plots**

**pIC50 value**

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.boxplot(x = 'class', y = 'pIC50', data = df_2class)

plt.xlabel('Bioactivity class', fontsize=14, fontweight='bold')
plt.ylabel('pIC50 value', fontsize=14, fontweight='bold')

plt.savefig('plot_ic50.pdf')

**Statistical analysis | Mann-Whitney U Test**

In [None]:
def mannwhitney(descriptor, verbose=False):
  # https://machinelearningmastery.com/nonparametric-statistical-significance-tests-in-python/
  from numpy.random import seed
  from numpy.random import randn
  from scipy.stats import mannwhitneyu

# seed the random number generator
  seed(1)

# actives and inactives
  selection = [descriptor, 'class']
  df = df_2class[selection]
  active = df[df['class'] == 'active']
  active = active[descriptor]

  selection = [descriptor, 'class']
  df = df_2class[selection]
  inactive = df[df['class'] == 'inactive']
  inactive = inactive[descriptor]

# compare samples
  stat, p = mannwhitneyu(active, inactive)
  #print('Statistics=%.3f, p=%.3f' % (stat, p))

# interpret
  alpha = 0.05
  if p > alpha:
    interpretation = 'Same distribution (fail to reject H0)'
  else:
    interpretation = 'Different distribution (reject H0)'
  
  results = pd.DataFrame({'Descriptor':descriptor,
                          'Statistics':stat,
                          'p':p,
                          'alpha':alpha,
                          'Interpretation':interpretation}, index=[0])
  filename = 'mannwhitneyu_' + descriptor + '.csv'
  results.to_csv(filename)

  return results

In [None]:
mannwhitney('pIC50')

**MW**

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.boxplot(x = 'class', y = 'MW', data = df_2class)

plt.xlabel('Bioactivity class', fontsize=14, fontweight='bold')
plt.ylabel('MW', fontsize=14, fontweight='bold')

plt.savefig('plot_MW.pdf')

In [None]:
mannwhitney('MW')

**LogP**

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.boxplot(x = 'class', y = 'LogP', data = df_2class)

plt.xlabel('Bioactivity class', fontsize=14, fontweight='bold')
plt.ylabel('LogP', fontsize=14, fontweight='bold')

plt.savefig('plot_LogP.pdf')

Statistical analysis | Mann-Whitney U Test



In [None]:
mannwhitney('LogP')

**NumHDonors**

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.boxplot(x = 'class', y = 'NumHDonors', data = df_2class)

plt.xlabel('Bioactivity class', fontsize=14, fontweight='bold')
plt.ylabel('NumHDonors', fontsize=14, fontweight='bold')

plt.savefig('plot_NumHDonors.pdf')

Statistical analysis | Mann-Whitney U Test



In [None]:
mannwhitney('NumHDonors')

**NumHAcceptors**

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.boxplot(x = 'class', y = 'NumHAcceptors', data = df_2class)

plt.xlabel('Bioactivity class', fontsize=14, fontweight='bold')
plt.ylabel('NumHAcceptors', fontsize=14, fontweight='bold')

plt.savefig('plot_NumHAcceptors.pdf')

In [None]:
mannwhitney('NumHAcceptors')


Interpretation of Statistical Results

Box Plots

pIC50 values

Taking a look at pIC50 values, the actives and inactives displayed statistically significant difference, which is to be expected since threshold values (IC50 < 1,000 nM = Actives while IC50 > 10,000 nM = Inactives, corresponding to pIC50 > 6 = Actives and pIC50 < 5 = Inactives) were used to define actives and inactives.

Lipinski's descriptors

All of the 4 Lipinski's descriptors exhibited statistically significant difference between the actives and inactives.

Zip files

In [None]:
! zip -r results.zip . -i *.csv *.pdf

# **[Part 3] Descriptor Calculation and Dataset Preparation**

**Download PaDEL-Descriptor**


In [None]:
! wget https://github.com/dataprofessor/bioinformatics/raw/master/padel.zip
! wget https://github.com/dataprofessor/bioinformatics/raw/master/padel.sh

In [None]:
! unzip padel.zip

**Load bioactivity data**

Download the curated ChEMBL bioactivity data that has been pre-processed from Parts 1 and 2 of this Bioinformatics Project series. Here we will be using the bioactivity_data_3class_pIC50.csv file that essentially contain the pIC50 values that we will be using for building a regression model.

In [None]:
! wget https://github.com/hafeezjaan77/AMP_Data/blob/main/effluxtransporter_04_bioactivity_data_3class_pIC50.csv

In [None]:
import pandas as pd

In [None]:
df3 = pd.read_csv('effluxtransporter_04_bioactivity_data_3class_pIC50.csv')

In [None]:
df3

In [None]:
selection = ['canonical_smiles','molecule_chembl_id']
df3_selection = df3[selection]
df3_selection.to_csv('molecule.smi', sep='\t', index=False, header=False)

In [None]:
! cat molecule.smi | head -5

In [None]:
! cat molecule.smi | wc -l

**Calculate fingerprint descriptors**

Calculate PaDEL descriptors

In [None]:
! cat padel.sh

In [None]:
! bash padel.sh

In [None]:
! ls -l

**Preparing the X and Y Data Matrices**

X data matrix

In [None]:
df3_X = pd.read_csv('descriptors_output.csv')

In [None]:
df3_X

In [None]:
df3_X = df3_X.drop(columns=['Name'])
df3_X

**Y variable**

Convert IC50 to pIC50

In [None]:
df3_Y = df3['pIC50']
df3_Y

**Combining X and Y variable**

In [None]:
dataset3 = pd.concat([df3_X,df3_Y], axis=1)
dataset3

In [None]:
dataset3.to_csv('effluxtransporter_6_bioactivity_data_3class_pIC50_pubchem_fp.csv', index=False)

# **Let's download the CSV file to your local computer for the Part 3B (Model Building)**.

# **[Part 4] Regression Models with Random Forest**

**1. Import libraries**

In [None]:
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

**2. Load the data set**

In [None]:
! wget https://github.com/hafeezjaan77/AMP_Data/blob/main/effluxtransporter_6_bioactivity_data_3class_pIC50_pubchem_fp.csv

In [None]:
df = pd.read_csv('effluxtransporter_6_bioactivity_data_3class_pIC50_pubchem_fp.csv')

**3. Input features**

The Replicase data set contains 881 input features and 1 output variable (pIC50 values).

**3.1. Input features**

In [None]:
X = df.drop('pIC50', axis=1)
X

**3.2. Output features**

In [None]:
Y = df.pIC50
Y

**3.3. Let's examine the data dimension**

In [None]:
X.shape

In [None]:
Y.shape

**3.4. Remove low variance features**

In [None]:
from sklearn.feature_selection import VarianceThreshold
selection = VarianceThreshold(threshold=(.8 * (1 - .8)))    
X = selection.fit_transform(X)

In [None]:
X.shape

# **4. Data split (80/20 ratio)**

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

In [None]:
X_train.shape, Y_train.shape

In [None]:
X_test.shape, Y_test.shape

# **5. Building a Regression Model using Random Forest**

In [None]:
model = RandomForestRegressor(n_estimators=100)
model.fit(X_train, Y_train)
r2 = model.score(X_test, Y_test)
r2

In [None]:
Y_pred = model.predict(X_test)

# **6. Scatter Plot of Experimental vs Predicted pIC50 Values**

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.set(color_codes=True)
sns.set_style("white")

ax = sns.regplot(Y_test, Y_pred, scatter_kws={'alpha':0.4})
ax.set_xlabel('Experimental pIC50', fontsize='large', fontweight='bold')
ax.set_ylabel('Predicted pIC50', fontsize='large', fontweight='bold')
ax.set_xlim(0, 12)
ax.set_ylim(0, 12)
ax.figure.set_size_inches(5, 5)
plt.show

# **[Part 5] Comparing Regressors**

**1. Import libraries**

In [None]:
! pip install lazypredict

In [None]:
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
import lazypredict
from lazypredict.Supervised import LazyRegressor

**2. Load the data set**

In [None]:
! wget https://github.com/hafeezjaan77/AMP_Data/blob/main/effluxtransporter_6_bioactivity_data_3class_pIC50_pubchem_fp.csv.1

In [None]:
df = pd.read_csv('effluxtransporter_6_bioactivity_data_3class_pIC50_pubchem_fp.csv')

In [None]:
X = df.drop('pIC50', axis=1)
Y = df.pIC50

**3. Data pre-processing**

In [None]:
# Examine X dimension
X.shape

In [None]:
# Remove low variance features
from sklearn.feature_selection import VarianceThreshold
selection = VarianceThreshold(threshold=(.8 * (1 - .8)))    
X = selection.fit_transform(X)
X.shape

In [None]:
# Perform data splitting using 80/20 ratio
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

**4. Compare ML algorithms**

In [None]:
# Defines and builds the lazyclassifier
clf = LazyRegressor(verbose=0,ignore_warnings=True, custom_metric=None)
models_train,predictions_train = clf.fit(X_train, X_train, Y_train, Y_train)
models_test,predictions_test = clf.fit(X_train, X_test, Y_train, Y_test)

In [None]:
# Performance table of the training set (80% subset)
predictions_train

In [None]:
# Performance table of the test set (20% subset)
predictions_test

**5. Data visualization of model performance**

In [None]:
# Bar plot of R-squared values
import matplotlib.pyplot as plt
import seaborn as sns

#train["R-Squared"] = [0 if i < 0 else i for i in train.iloc[:,0] ]

plt.figure(figsize=(5, 10))
sns.set_theme(style="whitegrid")
ax = sns.barplot(y=predictions_train.index, x="R-Squared", data=predictions_train)
ax.set(xlim=(0, 1))

In [None]:
# Bar plot of RMSE values
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(5, 10))
sns.set_theme(style="whitegrid")
ax = sns.barplot(y=predictions_train.index, x="RMSE", data=predictions_train)
ax.set(xlim=(0, 10))

In [None]:
# Bar plot of calculation time
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(5, 10))
sns.set_theme(style="whitegrid")
ax = sns.barplot(y=predictions_train.index, x="Time Taken", data=predictions_train)
ax.set(xlim=(0, 10))

# **QSAR Model Building of Replicase Inhibitors**


**Read in data**

In [None]:
import pandas as pd

In [None]:
data = pd.read_csv('effluxtransporter_6_bioactivity_data_3class_pIC50_pubchem_fp.csv', error_bad_lines=False)

In [None]:
data

In [None]:
X = data.drop(['pIC50'], axis=1)
X

In [None]:
Y = data.iloc[:,-1]
Y

**Remove low variance features**

In [None]:
from sklearn.feature_selection import VarianceThreshold

def remove_low_variance(input_data, threshold=0.1):
    selection = VarianceThreshold(threshold)
    selection.fit(input_data)
    return input_data[input_data.columns[selection.get_support(indices=True)]]

X = remove_low_variance(X, threshold=0.1)
X

In [None]:
X.to_csv('descriptor_list.csv', index = False)

In [None]:
# In the app, use the following to get this same descriptor list
# of 218 variables from the initial set of 881 variables
# Xlist = list(pd.read_csv('descriptor_list.csv').columns)
# X[Xlist]

# **Random Forest Regression Model**

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
model = RandomForestRegressor(n_estimators=500, random_state=42)
model.fit(X, Y)
r2 = model.score(X, Y)
r2

# **Model Prediction**

In [None]:
Y_pred = model.predict(X)
Y_pred

# **Model Performance**

In [None]:
print('Mean squared error (MSE): %.2f'
      % mean_squared_error(Y, Y_pred))
print('Coefficient of determination (R^2): %.2f'
      % r2_score(Y, Y_pred))

# **Data Visualization** **(Experimental vs Predicted pIC50 for Training Data)**

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
plt.figure(figsize=(5,5))
plt.scatter(x=Y, y=Y_pred, c="#7CAE00", alpha=0.3)

# Add trendline
# https://stackoverflow.com/questions/26447191/how-to-add-trendline-in-python-matplotlib-dot-scatter-graphs
z = np.polyfit(Y, Y_pred, 1)
p = np.poly1d(z)

plt.plot(Y,p(Y),"#F8766D")
plt.ylabel('Predicted pIC50')
plt.xlabel('Experimental pIC50')

# **Save Model as Pickle Object**

In [None]:
import pickle

In [None]:
pickle.dump(model, open('effluxtransporter_model.pkl', 'wb'))