# 4. Let's create the input file 🧪
<br>


<div style="line-height: 1.5;">

This inp. file is gonna be the base for the creation of our *.orcacosmo* that we are gonna use in the openCOSMO-RS program.

The first step is to generate a *.inp* type file with some information about the molecule. The formatting should be the following: 

**<span style="color:LightCoral">name_of_molecule</span>  [TAB] <span style="color:Plum">SMILES_of_molecule</span>  [TAB]  <span style="color:LightSteelBlue">*xyz_file*</span>  [TAB]  <span style="color:LightSeaGreen">charge_of_molecule</span>  [TAB]  <span style="color:#DB7093">geometry_optimization</span>**

- **<span style="color:LightCoral">name_of_molecule:</span>** any name like methanol, water, etc.

- **<span style="color:Plum">SMILES_of_molecule:</span>** SMILES are way to represent a molecule’s structure through string characters. For methanol it would be: CO, for water O. In websites like [Pubchem](https://pubchem.ncbi.nlm.nih.gov/docs/structure-search) and [Chemspider](https://www.chemspider.com/) SMILES are available for a variety of molecules

- **<span style="color:LightSteelBlue">*xyz_file:*</span>** text file used to describe the geometry of the molecule, it contains the 3D information about all the atoms in the molecule

- **<span style="color:LightSeaGreen">charge_of_molecule:</span>** can be neutral molecules: 0, or cations +1, +2 or anions: -1, -2

- **<span style="color:#DB7093">geometry_optimization:</span>** if the optimization should be performed, write **TRUE**, if not, **FALSE** or **leave blank**. The geometry optimization is to find the arrangement of atoms in a molecule that corresponds to the lowest possible energy state, so called *equilibrium geometry* <br><br>

The **<span style="color:LightSteelBlue">*xyz_file*</span>** is optional to include as well as the <span style="color:#DB7093">geometry_optimization</span> <br><br>

- If we do not have an xyz_file, the program will use Python's library RDKit or Balloon to search for a conformer before sending it to xtb to do an extensive conformer search. At the end the program will give us an xyz file that generated based on those calculations <br>

- If we do not include the geometry optimization, the last calculations will be made using the xyz file only without optimizing the position of the atoms<br>

<br>

If we want to create an input file for the water molecule we can do:

**water [TAB] O [TAB] [TAB] 0**

In this we did not include the <span style="color:LightSteelBlue">*xyz_file*</span> and  the <span style="color:#DB7093">geometry_optimization</span>

<br>
<div>

**For every line that you have the following symbols: ⚠️⚠️⚠️ you need to modify and specify the correct address in your computer** 

In [None]:
# Let's now create and save our file
#the [TAB] in python is represented with '\t'
content = 'water\tO\t\t0'

#set up the file path and name the file ⚠️⚠️⚠️
file_path = r"INPUT_FILES\water.inp"

# Open a file in write mode
with open(file_path, "w") as file:
    file.write(content)

print(f"File created  at {file_path}")

File created  at input_files\water.inp


<div style="line-height: 1.5;">
Moreover, you can add more than one molecule in the input file! <br>
Let's create it for butane and pentane, for the first molecule we will include the requirements and for the latter we will not, to see the different ways we can set up our input file. <br>
Let's do it for butane and pentane: <br>

- For **butane** we will include the <span style="color:LightSteelBlue">*xyz_file*</span> and  the <span style="color:#DB7093">geometry_optimization</span> <br>
- For **pentane** we will omit both <span style="color:#DB7093">geometry_optimization</span> 

<br><br>

First let's create the <span style="color:LightSteelBlue">*xyz_file*</span> file for butane, an <span style="color:LightSteelBlue">*xyz_file*</span> file is a simple text file format used to describe atomic coordinates in a molecule. It can be obtained from online databases.

This is what the file should have:

1. **First Line**: Contains the number of atoms in the molecule.
2. **Second Line**: A comment line, which can be left blank or used to describe the molecule.
3. **Subsequent Lines**: Each line corresponds to an atom and includes the chemical symbol of the atom and the X, Y, and Z coordinates of the atom in space.


<div>

In [None]:
#first we set up the file path and name the file ⚠️⚠️⚠️
file_path = r"INPUT_FILES\butane_xyz_file.xyz"

#remember that the formula for butane is C4H10, so we have 14 atoms in total 
# and 14 lines of the xyz coordinates

data =[
"""14
energy: -158.5047254379879860
C  1.9543949951702300  0.1066957079000400  0.1775875249139100
C  0.4882751879667100  0.5453039742252200  0.2277970663069400
C  -0.4882287405492300  -0.5452110216957901  -0.2281787312959500
C  -1.9543910949774299  -0.1067726259116600  -0.1772934340389700
H  2.2499857554831100  -0.1758360459389800  -0.8445376818148200
H  2.6285568228203302  0.9095199005446900  0.5096167887711700
H  2.1305732780439999  -0.7660158334354400  0.8251181873099500
H  0.3484260931082300  1.4411742720395899  -0.4016176143351700
H  0.2297988124954200  0.8546292299119000  1.2553444804038401
H  -0.2301753030818900  -0.8541267816543600  -1.2559543749457500
H  -0.3480843754759300  -1.4412851333271901  0.4008742289139700
H  -2.1311096320036498  0.7656513725151000  -0.8250616913889099
H  -2.6286360949013798  -0.9098232985523900  -0.5086067693224000
H  -2.2493857040985001  0.1760962833792700  0.8449120205221400
"""]

with open(file_path, 'w') as file:
    file.write(data[0])

print(f"File created  at {file_path}")

File created  at input_files\butane_xyz_file.xyz


In [None]:
molecules= [
    "butane\tCCCC\tbutane_xyz_file\t0\tTrue",
    "pentane\tCCCCC\t\t0\tFalse"
]

#set up the file path and name the file ⚠️⚠️⚠️
file_path = r"INPUT_FILES\two_molecules.inp"

# Open a file in write mode
with open(file_path, "w") as file:
    # Join the list items with a newline character
    file.write("\n".join(molecules))

print(f"File created  at {file_path}")

File created  at input_files\two_molecules.inp
