# **PART 3:** Concatenate TCRex files
In part 2 we split the data into smaller chunks. This way we ensure the files satisfy the input limit set by the [TCRex](https://tcrex.biodatamining.be/) software. In this tutorial, we merge the results back together. In case you did not split the TCRex input file, you can skip this step and directly move to **PART 4: TCRex results & statistics**.

In [1]:
import os
# Set the working directory to the repository directory
os.chdir("/home/sebastiaan/PhD/Repositories/book_chapter/")

In [2]:
import pandas as pd

We start by defining two functions for processing the TCRex results. The first function will be used to read the data correctly. The second function will combine the different chunks originating from the same file back together. 

In [3]:
def read_results(folder, file):
    
    """
    Read in a TCRex results file as a pandas dataframe. 
    Ignores meta data information preceded with a '#' sign.
    
    Args:
    - folder: The folder where the TCRex results file is located
    - file: The name of the TCRex results file
    """
    return pd.read_csv(os.path.join(folder, file), sep = "\t", comment = "#")


def concatenate_data(indir, outdir):
    
    """
    Concatenate TCRex results from different files into one file.
    
    Args:
    - indir: Path to the directory where the folder with the TCRex results files are located.
    - outdir
    """

    # Get a list of all files in the folder
    files = os.listdir(indir)
    name = os.path.basename(indir)
        
    # Concatenate all dataframes in the results list
    all_results = pd.concat(
        objs = [read_results(indir, fn) for fn in files]
        )

    # Save concatenated dataframe in a new folder
    new_file = os.path.join(outdir, '.'.join([name,'tsv']))
    
    # Write results to a new file
    all_results.to_csv(new_file, sep = '\t', index = False)

Now we can apply the `concatenate_data` function that we just wrote to combine the chunks from the same file.

In [None]:
concatenate_data(
    indir = './data/tcrex_out/P1_15',
    outdir = './data/results/tcrex'
    )