# GCN_Command_Guideline

## regression.py
- --seed', '-s', type=int, default=123, help='seed'
- --input_path', '-D', type=str, required=True, help="dataset path and name ('./data01.xlsx')"
- --solute_smiles', '-X1', type=str, required=True, help="column name of solute smiles ('Solute SMILES')"
- --solvent_smiles', '-X2', type=str, required=True, help="column name of solvent smiles ('Solvent SMILES')"
- --logS', '-Y', type=str, required=True, help="column name of logS ('LogS')"
- --output_path', '-O', type=str, default='./results/result.json', help="output path and name"
- --model_path', '-M', type=str, default='./results/model.pt', help="model path and name"
- --conv', '-c', type=str, default='GCNConv', choices=['GCNConv', 'ARMAConv', 'SAGEConv']
- --test_size', '-z', type=float, default=0.2, help='test size'
- --random_state', '-r', type=int, default=123, help='random state'
- --batch_size', '-b', type=int, default=256, help='batch size'
- --epoch', '-e', type=int, default=200, help='epoch'
- --lr', '-l', type=float, default=0.005, help='learning rate'
- --step_size', '-t', type=int, default=5, help='step_size of lr_scheduler'
- --gamma', '-g', type=float, default=0.9, help='gamma of lr_scheduler'
- --dropout', '-d', type=float, default=0.1, help='dropout'
- --exp_name', '-n', type=str, default='myExp', help='experiment name'

In [9]:
!python ../regression.py\
        -D './data01.xlsx'\
        -X1 'Solute SMILES'\
        -X2 'Solvent SMILES'\
        -Y 'LogS'\
        -O './results/regression/test_regre.json'\
        -M './results/regression/test_regre_model.pt'\
        -e 10


Graph Convolutional Network for logS Regression
Soongsil University, Seoul, South Korea
Computational Science and Artificial Intelligence Lab

[Preparing Data]
- Device : cuda

[Converting to Graph]
- Train Data : 14076
- Test Data : 3520

[Train]
- Epoch : 1
- Loss : 2.4276
- Epoch : 2
- Loss : 1.5367
- Epoch : 3
- Loss : 1.3740
- Epoch : 4
- Loss : 1.2480
- Epoch : 5
- Loss : 1.2089
- Epoch : 6
- Loss : 1.1476
- Epoch : 7
- Loss : 1.1169
- Epoch : 8
- Loss : 1.1159
- Epoch : 9
- Loss : 1.0549
- Epoch : 10
- Loss : 1.0461

[Test]
- MAE : 0.7586
- MSE : 1.1456
- R2 : 0.7584



## regre_test.py
- --model_path', '-M', type=str, default='./results/model.pt', help="model path and name ('./results/model.pt')"
- --input_path', '-I', type=str, default='./smiles.txt', help="input path and name ('./smiles.txt')"
- --output_path', '-O', type=str, default='./results/pred_results.txt', help="output path and name ('./results/pred_results.txt')"
- --conv', '-c', type=str, default='GCNConv', choices=['GCNConv', 'ARMAConv', 'SAGEConv']
- --dropout', '-d', type=float, default=0.1, help='dropout'

In [10]:
!cat ./smiles.txt

Solute SMILES	Solvent SMILES
C=CC(=O)N	O
C=CC(=O)N	CC(=O)C
CCCC1=CC=CC=C1	O
CCCC1=CC=CC=C1	CC(=O)C

In [11]:
!python ../regre_test.py\
        -M './results/regression/test_regre_model.pt'\
        -I './smiles.txt'\
        -O './results/regression/myTest.txt'


Graph Nonvolutional Network for Regression logS
Soongsil University, Seoul, South Korea
Computational Science and Artificial Intelligence Lab

Done!



In [12]:
!cat ./results/regression/myTest.txt

Solute SMILES	Solvent SMILES	logS
C=CC(=O)N	O	0.08149509
C=CC(=O)N	CC(=O)C	0.20979212
CCCC1=CC=CC=C1	O	-2.7093468
CCCC1=CC=CC=C1	CC(=O)C	0.5737878


## classification.py
- --seed', '-s', type=int, default=123, help='seed'
- --input_path', '-D', type=str, required=True, help="dataset path and name ('./data01.xlsx')"
- --solute_smiles', '-X1', type=str, required=True, help="column name of solute smiles ('Solute SMILES')"
- --solvent_smiles', '-X2', type=str, required=True, help="column name of solvent smiles ('Solvent SMILES')"
- --logS', '-Y', type=str, required=True, help="column name of logS ('LogS')"
- --output_path', '-O', type=str, default='./results/result.json', help="output path and name"
- --model_path', '-M', type=str, default='./results/model.pt', help="model path and name"
- --conv', '-c', type=str, default='GCNConv', choices=['GCNConv', 'ARMAConv', 'SAGEConv']
- --test_size', '-z', type=float, default=0.2, help='test size'
- --random_state', '-r', type=int, default=123, help='random state'
- --batch_size', '-b', type=int, default=256, help='batch size'
- --epoch', '-e', type=int, default=100, help='epoch'
- --lr', '-l', type=float, default=0.0005, help='learning rate'
- --step_size', '-t', type=int, default=5, help='step_size of lr_scheduler'
- --gamma', '-g', type=float, default=0.9, help='gamma of lr_scheduler'
- --dropout', '-d', type=float, default=0.1, help='dropout'
- --exp_name', '-n', type=str, default='myExp', help='experiment name'
- --ROC', '-ROC', action='store_true', help='save ROC curve result'

In [13]:
!python ../classification.py\
        -D './data01.xlsx'\
        -X1 'Solute SMILES'\
        -X2 'Solvent SMILES'\
        -Y 'LogS'\
        -O './results/classification/test_class.json'\
        -M './results/classification/test_class_model.pt'\
        -e 10


Graph Convolutional Network for logS Classification
Soongsil University, Seoul, South Korea
Computational Science and Artificial Intelligence Lab

[Preparing Data]
- Device : cuda

[Converting to Graph]
- Train Data : 14076
- Test Data : 3520

[Train]
- Epoch : 1
- Loss : 0.7374
- Accuracy : 66.3484
- Epoch : 2
- Loss : 0.6178
- Accuracy : 72.6562
- Epoch : 3
- Loss : 0.5600
- Accuracy : 75.8898
- Epoch : 4
- Loss : 0.5385
- Accuracy : 76.6855
- Epoch : 5
- Loss : 0.5167
- Accuracy : 77.9586
- Epoch : 6
- Loss : 0.4876
- Accuracy : 79.4705
- Epoch : 7
- Loss : 0.4686
- Accuracy : 80.3458
- Epoch : 8
- Loss : 0.4607
- Accuracy : 80.5773
- Epoch : 9
- Loss : 0.4427
- Accuracy : 81.8142
- Epoch : 10
- Loss : 0.4095
- Accuracy : 82.8125

[Test]
- Total Accuracy : 78 %
- Accuracy of Low : 80 %
- Accuracy of Medium : 73 %
- Accuracy of High : 82 %
- F-1 Micro Score : 0.78
- F-1 Macro Score : 0.79



## class_test.py
- --model_path', '-M', type=str, default='./results/model.pt', help="model path and name ('./results/model.pt')"
- --input_path', '-I', type=str, default='./smiles.txt', help="input path and name ('./smiles.txt')"
- --output_path', '-O', type=str, default='./results/pred_results.txt', help="output path and name"
- --conv', '-c', type=str, default='GCNConv', choices=['GCNConv', 'ARMAConv', 'SAGEConv']
- --dropout', '-d', type=float, default=0.1, help='dropout'

In [14]:
!cat ./smiles.txt

Solute SMILES	Solvent SMILES
C=CC(=O)N	O
C=CC(=O)N	CC(=O)C
CCCC1=CC=CC=C1	O
CCCC1=CC=CC=C1	CC(=O)C

In [15]:
!python ../class_test.py\
        -M './results/classification/test_class_model.pt'\
        -I './smiles.txt'\
        -O './results/classification/myTest.txt'


Graph Convolutional Network for Regression logS
Soongsil University, Seoul, South Korea
Computational Science and Artificial Intelligence Lab

Done!



In [16]:
!cat ./results/classification/myTest.txt

Solute SMILES	Solvent SMILES	Class
C=CC(=O)N	O	High
C=CC(=O)N	CC(=O)C	High
CCCC1=CC=CC=C1	O	Low
CCCC1=CC=CC=C1	CC(=O)C	High
