# Convert the outFile from HISAT2 to table in Excel

The script converts the output file generated by HISAT2 into a structured table format and writes the results to an Excel file. It parses the input file, extracts relevant data sections, and creates a DataFrame using pandas. The final DataFrame is then saved as an Excel file named output_file.xlsx. Additionally, the script includes comments for clarity and provides instructions on how to merge files from HISAT and install the necessary pandas library.

In [17]:
# To merge the file from HISAT use
#tail -n +1 *.err > concatenated_filename_output.txt

In [12]:
import pandas as pd

# Read input file and split into individual sections
with open('concatenated_Ticks_output.txt') as f:
    data = f.read().split('== ')

# Remove empty first item
data = data[1:]

# Extract data from each section
results = []
for section in data:
    lines = section.strip().split('\n')
    filename = lines[0].strip()
    reads = lines[1].split('reads')[0].strip()
    overall_alignment_rate = lines[-1].split()[0].strip()
    results.append((filename, reads, overall_alignment_rate))

# Convert results to DataFrame and write to Excel
df = pd.DataFrame(results, columns=['Filename', 'Reads', 'Overall Alignment Rate'])
df.to_excel('output_Ticks_Test.xlsx', index=False)


In [13]:
# Read the Excel file generated by the script
df = pd.read_excel('output_Ticks_Test.xlsx')
print(df)

                    Filename      Reads Overall Alignment Rate
0   hisat_36230627_10.err ==   93653313                 46.67%
1   hisat_36230627_11.err ==  149195053                 39.47%
2   hisat_36230627_12.err ==  149195053                 39.47%
3   hisat_36230627_13.err ==   77594407                 44.52%
4   hisat_36230627_14.err ==   77594407                 44.52%
5   hisat_36230627_15.err ==   82108712                 43.15%
6   hisat_36230627_16.err ==   82108712                 43.15%
7    hisat_36230627_1.err ==   73046065                 40.06%
8    hisat_36230627_2.err ==   73046065                 40.06%
9    hisat_36230627_3.err ==   71453766                 41.22%
10   hisat_36230627_4.err ==   71453766                 41.22%
11   hisat_36230627_5.err ==  121872853                 47.82%
12   hisat_36230627_6.err ==  121872853                 47.82%
13   hisat_36230627_7.err ==   92493365                 39.65%
14   hisat_36230627_8.err ==   92493365                

Name: HISAT2_Output_to_Excel_Converter.py
Description: The script converts the output file generated by HISAT2 into a structured table format and writes the results to an Excel file. It parses the input file, extracts relevant data sections, and creates a DataFrame using pandas. The final DataFrame is then saved as an Excel file named output_file.xlsx. Additionally, the script includes comments for clarity and provides instructions on how to merge files from HISAT and install the necessary pandas library.
Author: Zaide Montes
 Institution: Lund University, Pheromone group to run in Rackham, uppmax
#Contact email: zk.montes10@gmail.com
#Date: Implemented on Nov 7, 2022