# Installation

Materials avalaible in the repository:

*   files for tests and tutorials (dir: **tutorials**)
*   functions (dir: **functions**)
*   template files to be filled (dir: **template_inputs**)
*   guided scripts (**script_***)
*   requirements.txt (for conda installation)






Prerequisites:

*   Python > 3 (https://www.python.org/)
*   Conda or Miniconda (https://conda.io/projects/conda/en/latest/user-guide/install/index.html)



In [None]:
# cloning repository via git (or download zip folder drictly from the github page)

git clone https://github.com/qLSLab/microFim.git

In [None]:
# create conda env

conda create --name microFIM --file requirements.txt --channel default --channel conda-forge --channel plotly

# Script usage

Guided scripts must be run in the main directory (within microFIM, after cloning the repository and create the environment). The scripts are 'interactive', with auto-completion for an easy usage.

We suggest to create a specific directory for your project, in order to set it for inputs and outputs.

In [None]:
python script_1_filtertable.py

This script can be used to filter your otu/taxa table based on a list of samples.
Files required and mandatory instructions:
* otu/esv/taxa table - the column name of OTU or TAXA must be '#ID'
* sample list  - the first row of your sample list must be '#SampleID'

The script will ask you to set the input directory and the two files mentioned -
otu/esv/taxa table and sample list. The format of the file does not matter at this stage,
the script will ask you the type of separator.

The output file will be a filtered CSV file saved into the input directory
(in order to allow subsequent analysis).

In [None]:
python script_2_tableconversion.py

This script can be used to convert a otu/esv/taxa table into a list of transactions.
At this stage, do not worry about the format of the input. The script will ask
you which is the separator.

The output will be saved as a list of transactions into input directory.

In [None]:
python script_3_microfimcalculation.py

This script calculate microbial patterns!
Files:
- otu/esv/taxa table previously converted in transactions
- file with parameters in .csv format (support, zmin and zmax + type of report)
    template available in the tutorial folder

In [None]:
script_4_additionalmeasures.py

This script calculate additional interest measures that can be used
to filter results. Currently, all-confidence metric is available (see README for details).

In [None]:
script_5_generatepatterntable.py

This script can be used to create the pattern table.
Inputs:
- pattern results;
- metadata file;
- transactional file.

The output will be saved as a CSV dataframe (with and without
inrerest measures) into input directory.

In [None]:
# # available from monday 15

script_6_generateplots.py

# Library usage


microFIM python functions were divided into thematic sections, in order to promote the integration of new functions and an easy development of the tool. Here we present three scripts that can be used on test/test1.csv files and the metadata and parameters related. 
 

*   The first one (named microFIM_example_code_1.py) filter the data table and convert it in transactional file. To filter, use a metadata files removing lines of samples you want to exclude.
*   The second create calculate patterns and create the pattern table with and without interest measures.
*   The third create visualizations.



In [None]:
import os
import sys
import pandas as pd
import numpy as np
import csv
from csv import writer
import readline
import re
import string

import fim
import functions.microdir as md
import functions.microfim as mf
import functions.microimport as mi
import functions.microinterestmeasures as mim

import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt

import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.metrics import pairwise_distances
from sklearn import manifold



""" microFIM example code on test/test1.csv files
of microFIM github repository
Input files to run microFIM:
- test1.csv
- metadata_test1.csv
- parameters_test1.csv

"""


# set dir
set_dir = 'test'
data_dir = md.set_inputs_dir(set_dir)
print(data_dir)

# change dir
os.chdir(data_dir)

# import files

metadata = pd.read_csv(os.path.join(data_dir, 'metadata_test1.csv'), header=0, index_col=None)
print(metadata)
data_table_name = 'test1.csv'
data_table = pd.read_csv(os.path.join(data_dir, 'test1.csv'), header=0, index_col=None, engine='python')
print(data_table)
#parameters = pd.read_csv(os.path.join(data_dir, data_table), sep=sep, header=0, index_col=None, engine='python')


# FILTER DATA TABLE VIA SAMPLE METADATA
# convert sample_list into a list
samples = metadata['#SampleID'].to_list()

# extract '#ID' column (otu/taxa) - see Docoumentation for details
id = data_table[['#ID']]
samples_table = data_table[[*samples]]

# concat datasets
new_data = pd.concat([id, samples_table], axis=1)
#print(new_data.info())


# remove rows with zeros
no_zeros = (new_data.iloc[:,1:] != 0).any(axis=1)
new_data = new_data.loc[no_zeros]
print(new_data)


# CONVERT IN TRANSACTIONAL File

file_name = data_table_name.split('.')
print(file_name)

# remove space from ID column
new_data['#ID'] = new_data['#ID'].str.replace(' ','_')

print(new_data)

n_cols = new_data.shape[1] - 1
#print(n_cols)
n_rows = new_data.shape[0] - 1
#print(n_rows)


t_list = mf.write_transactions(n_cols, n_rows, new_data)
#print(t_list)

# save as transaction list
with open(data_dir + '/' + 'transactions_' + file_name[0], 'w') as f:
    wr = csv.writer(f)
    wr.writerows(t_list)

# convert commas in spaces (for the next steps)
# remove old output to clean folder
output = 'transactions_' + file_name[0]

print(f'\n\n> File converted and saved as ' + output + '.csv' + ' in ' + data_dir + '\n\n')

# this last script must be run in bash. If you use a Linux terminal, rm command
# will not be necessary
print(f'\n\n> Now run from your command line in {data_dir}:\n\n \
sed -i -e "s/,/ /g" {output}\n\n \
rm {output}-e\n\n')

In [None]:
import os
import sys
import pandas as pd
import numpy as np
import csv
from csv import writer
import readline
import re
import string

import fim
import functions.microdir as md
import functions.microfim as mf
import functions.microimport as mi
import functions.microinterestmeasures as mim

import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt

import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.metrics import pairwise_distances
from sklearn import manifold



""" microFIM example code on test/test1.csv files
of microFIM github repository
Input files to run microFIM:
- test1.csv
- transactional file (can be obtained with microFIM_example_code_1.py)
- metadata_test1.csv
- parameters_test1.csv

Default is itemsets patterns, but also closed and maximal can be calculated.

"""


# set dir
set_dir = 'test'
data_dir = md.set_inputs_dir(set_dir)
print(data_dir)

# change dir
os.chdir(data_dir)

# import files

metadata = pd.read_csv(os.path.join(data_dir, 'metadata_test1.csv'), header=0, index_col=None)
print(metadata)
data_table_name = 'test1.csv'
data_table = pd.read_csv(os.path.join(data_dir, 'test1.csv'), header=0, index_col=None, engine='python')
print(data_table)
par_file = 'parameters_test1.csv'
trans_file = 'transactions_test1'

# import transactions and file with paramaters
t = mf.read_transaction(os.path.join(data_dir, trans_file))
print(t)

minsupp, zmin, zmax= mi.itemsets_parameters(data_dir, par_file)
print(minsupp)
print(zmin)
print(zmax)

#sys.exit()

# set fim options
report= '[asS' # mandatory
to_calculate = 'i' # default (can be changed in c or m)


# run eclat (mandatory)
if to_calculate == 'i':
    results = fim.eclat(t, target='s', supp=minsupp, zmin=zmin, report=report)
elif to_calculate == 'c':
    results = fim.eclat(t, target='c', supp=minsupp, zmin=zmin, report=report)
elif to_calculate == 'm':
    results = fim.eclat(t, target='m', supp=minsupp, zmin=zmin, report=report)

print(results)
# define output name
output_file = 'patterns_test1'

# write results
file = open(data_dir + '/' + output_file + '.csv', 'w+', newline ='')
# writing the data into the file
with file:
    write = csv.writer(file)
    write.writerows(results)


out_file = data_dir + '/' + output_file + '.csv'
new_out_file = data_dir + 'df_' + output_file + '.csv'
with open(out_file, 'r') as f, open(new_out_file, 'w') as fo:
    for line in f:
        fo.write(line.replace('"', '').replace("'", "").replace('),[', ')/[').replace(')', '').replace('(', '').replace('[', '').replace(']', ''))


## convert itemsets results into a dataframe
df = mf.itemsets_dataframe(new_out_file)
df.to_csv(new_out_file, index=False)

print(df)

print(out_file)
os.remove(out_file)

print('Results saved as ' + new_out_file + ' in ' + data_dir + '\n\n')

#sys.exit()

# CALCULATE ADDITIONAL METRICS
# calculate occurrences for each id
frequency = mim.calculate_ids_occurrence(data_table)
print(frequency)

# calculate len of trans_file
lines_in_file = open(os.path.join(data_dir, trans_file), 'r').readlines()
#print(lines_in_file)
number_of_lines = float(len(lines_in_file))

#print(number_of_lines)


data_allc_update = mim.all_confidence(df, frequency, number_of_lines)
#print(data_allc_update)


# write file
file_name = 'addm_patterns_test1'

data_allc_update.to_csv(os.path.join(data_dir, 'df_' + file_name + '.csv'), index=False)


print('Results saved as df_' + file_name + '.csv in ' + data_dir + '\n\n')


## GENERATE PATTERN TABLE

col_patterns = mf.set_patterns_for_matching(data_allc_update)
transactional_list = mf.set_transdata_for_matching(data_dir, trans_file)
meta_file = 'metadata_test1.csv'
sep = ','
pattern_table = mf.generate_pattern_occurrences(data_dir, data_allc_update, transactional_list, meta_file, sep)

df_pattern_table = mf.concat_tables(df, pattern_table)
print(df_pattern_table)

# only 0 and 1
df_pattern_table_clean = df_pattern_table.drop(['Samples', 'Support', 'Support(%)', 'Pattern length', 'All-confidence'], axis=1)

#sys.exit()

# save
output_file = 'pattern_table_test'

df_pattern_table.to_csv(os.path.join(data_dir, output_file + '_complete.csv'), index=False)
df_pattern_table_clean.to_csv(os.path.join(data_dir, output_file + '.csv'), index=False)

In [None]:
# available from monday 15

# Integration in QIIME2 framework

## Export taxa tables for microFIM analysis

In [None]:
# activate the env (if you do not installed QIIME2 yet, please see https://docs.qiime2.org/2021.8/getting-started/)

conda activate qiime2-2020.8 # example version


# export biom file form qza

qiime tools export --input-path table.qza --output-path exported-feature-table


# convert biom file to tsv

biom convert -i exported-feature-table/feature-table.biom -o feature-table.tsv --to-tsv

In [None]:
# substitue #OTU ID with #ID

sed -i -e "s/#OTU ID/#ID/g" feature-table.tsv


# remove first row

sed -i '1d' feature-table.tsv


## READY TO BE IMPORTED IN microFIM ##

## Import pattern tables in qza format to perform QIIME2 analysis

Change 'Pattern' column in #OTU ID before converting.

In [None]:
# convert in biom file

biom convert -i pattern_table_test.tsv \
  -o pattern_table_test.biom --table-type="OTU table" --to-json


# import in qiime2

qiime tools import \
  --input-path pattern_table_test.biom \
  --type 'FeatureTable[Frequency]' \
  --input-format BIOMV100Format \
  --output-path pattern_table_test.qza