# ZhangEN

## Index
1. [Instantiate model class](#Instantiate-model-class)
2. [Define clock metadata](#Define-clock-metadata)
3. [Download clock dependencies](#Download-clock-dependencies)
5. [Load features](#Load-features)
6. [Load weights into base model](#Load-weights-into-base-model)
7. [Load reference values](#Load-reference-values)
8. [Load preprocess and postprocess objects](#Load-preprocess-and-postprocess-objects)
10. [Check all clock parameters](#Check-all-clock-parameters)
10. [Basic test](#Basic-test)
11. [Save torch model](#Save-torch-model)
12. [Clear directory](#Clear-directory)

Let's first import some packages:

In [1]:
import os
import inspect
import shutil
import json
import torch
import pandas as pd
import pyaging as pya

## Instantiate model class

In [2]:
def print_entire_class(cls):
    source = inspect.getsource(cls)
    print(source)

print_entire_class(pya.models.ZhangEN)

class ZhangEN(pyagingModel):
    def __init__(self):
        super().__init__()

    def preprocess(self, x):
        """
        Scales the input PyTorch tensor per row with mean 0 and std 1.
        """
        row_means = torch.mean(x, dim=1, keepdim=True)
        row_stds = torch.std(x, dim=1, keepdim=True)

        # Avoid division by zero in case of a row with constant value
        row_stds = torch.where(row_stds == 0, torch.ones_like(row_stds), row_stds)

        x_scaled = (x - row_means) / row_stds
        return x_scaled

    def postprocess(self, x):
        return x



In [3]:
model = pya.models.ZhangEN()

## Define clock metadata

In [4]:
model.metadata["clock_name"] = 'zhangen'
model.metadata["data_type"] = 'methylation'
model.metadata["species"] = 'Homo sapiens'
model.metadata["year"] = 2019
model.metadata["approved_by_author"] = '⌛'
model.metadata["citation"] = "Zhang, Qian, et al. \"Improved precision of epigenetic clock estimates across tissues and its implication for biological ageing.\" Genome medicine 11 (2019): 1-11."
model.metadata["doi"] = 'https://doi.org/10.1186/s13073-019-0667-1'
model.metadata["research_only"] = None
model.metadata["notes"] = None

## Download clock dependencies

#### Download GitHub repository

In [5]:
github_url = "https://github.com/qzhang314/DNAm-based-age-predictor.git"
github_folder_name = github_url.split('/')[-1].split('.')[0]
os.system(f"git clone {github_url}")

0

#### Download from R package

In [6]:
%%writefile download.r

data = readRDS("DNAm-based-age-predictor/data.rds")

write.csv(data, "example_data.csv")

Writing download.r


In [7]:
os.system("Rscript download.r")

0

## Load features

#### From CSV file

In [8]:
df = pd.read_table('DNAm-based-age-predictor/en.coef', sep=' ')
df['feature'] = df['probe']
df['coefficient'] = df['coef']

model.features = df['feature'][1:].tolist()

## Load weights into base model

In [9]:
weights = torch.tensor(df['coefficient'][1:].tolist()).unsqueeze(0)
intercept = torch.tensor([df['coefficient'][0]])

#### Linear model

In [10]:
base_model = pya.models.LinearModel(input_dim=len(model.features))

base_model.linear.weight.data = weights.float()
base_model.linear.bias.data = intercept.float()

model.base_model = base_model

## Load reference values

#### From CSV file

In [11]:
reference_feature_values_df = pd.read_csv('example_data.csv', index_col=0)
reference_feature_values_df = reference_feature_values_df.loc[:, model.features]
model.reference_values = reference_feature_values_df.mean().tolist()

## Load preprocess and postprocess objects

In [12]:
model.preprocess_name = 'scale_row'
model.preprocess_dependencies = None

In [13]:
model.postprocess_name = None
model.postprocess_dependencies = None

## Check all clock parameters

In [14]:
pya.utils.print_model_details(model)


Model Attributes:

training: True
metadata: {'approved_by_author': '⌛',
 'citation': 'Zhang, Qian, et al. "Improved precision of epigenetic clock '
             'estimates across tissues and its implication for biological '
             'ageing." Genome medicine 11 (2019): 1-11.',
 'clock_name': 'zhangen',
 'data_type': 'methylation',
 'doi': 'https://doi.org/10.1186/s13073-019-0667-1',
 'notes': None,
 'research_only': None,
 'species': 'Homo sapiens',
 'version': None,
 'year': 2019}
reference_values: [0.4203926578618443, 0.4908907575500855, 0.4552739071801655, 0.4913173878831697, 0.1630209603278157, 0.28031076416301215, 0.5630446021353376, 0.07173451377952844, 0.5286791351513329, 0.4914897138593993, 0.8776324935503583, 0.7783989797577173, 0.3679109172453385, 0.5469457656601266, 0.36321717183155133, 0.3929905988245433, 0.11900061695097393, 0.1448950534091979, 0.11166595534101968, 0.08958121797351054, 0.262072821834283, 0.4936214944065201, 0.07967343711600829, 0.4159523391121834, 0.6

## Basic test

In [15]:
torch.manual_seed(42)
input = torch.randn(10, len(model.features), dtype=float)
model.eval()
model.to(float)
pred = model(input)
pred

tensor([[ 28.2548],
        [104.3112],
        [ 83.9525],
        [ 78.0909],
        [ 60.1387],
        [ 63.0642],
        [ 67.6295],
        [ 73.1206],
        [ 72.9059],
        [ 56.8473]], dtype=torch.float64, grad_fn=<AddmmBackward0>)

## Save torch model

In [16]:
torch.save(model, f"../weights/{model.metadata['clock_name']}.pt")

## Clear directory
<a id="10"></a>

In [17]:
# Function to remove a folder and all its contents
def remove_folder(path):
    try:
        shutil.rmtree(path)
        print(f"Deleted folder: {path}")
    except Exception as e:
        print(f"Error deleting folder {path}: {e}")

# Get a list of all files and folders in the current directory
all_items = os.listdir('.')

# Loop through the items
for item in all_items:
    # Check if it's a file and does not end with .ipynb
    if os.path.isfile(item) and not item.endswith('.ipynb'):
        os.remove(item)
        print(f"Deleted file: {item}")
    # Check if it's a folder
    elif os.path.isdir(item):
        remove_folder(item)

Deleted file: download.r
Deleted folder: DNAm-based-age-predictor
Deleted file: example_data.csv
