# Progesterone production in *Saccharomyces cerevisiae* 

## 1. Introduction

### 1.1 Literature review of the compound

Steroids are ring-structured lipophilic compounds serving various functions in cells. They are of huge pharmaceutical interest, as they are used to treat various diseases and manage both male and female fertility (Tong et al. 2009). Progesterone (**Figure 1**), a female sex hormone, is among the most valuable steroid drugs (Batth et al. 2020) . It is mainly used as a contraceptive, which has been on the market for decades (Howie 1985; Nath et al. 2010)

The market size for progesterone alone is currently estimated to be USD 800 million but is expected to reach a staggering USD 1569 million by 2027; In other words, the market size will nearly double within the next five years, due to the increasing progesterone demand (Market Data Forecast, 2022). 


<p>
<img src="figures/progesterone.png" width="75%" />
</p>

**Figure 1.** The structure of progesterone  $\textrm{C}_{21}\textrm{H}_{30}\textrm{O}_{2}$; a female sex steroid hormone.

The chemical synthesis of steroids was a highly competitive research field during the 20th century (Slater 2000). In 1940, Bachmann and Wilds produced the sex hormone equilenin as the first complex molecule to be chemically synthesized (Bachmann et al. 1940). In the following decades, many other steroids, including progesterone, were fully chemically synthesized with RB Woodward revolutionizing the field in 1952 (Woodward et al. 1952; Al Jasem 2014). He later won the Nobel prize for this work (Bartlett et al. 1965). 

Despite the great efforts in the chemical synthesis of steroids, the structural complexity of steroids complicates their synthesis which often requires harsh conditions and contributes to heavy environmental pollution (Tong et al. 2009). Therefore, most steroid drugs are produced semi-synthetically, using a naturally abundant complex precursor as a starting point. Diosgenin, a steroidal sapogenin, is commonly used as a precursor and can be extracted from plants of the *Dioscorea* genus (Jesus et al. 2016; Al Jasem 2014; Dong et al. 2015). For example, progesterone can be produced from diosgenin via the so-called Marker synthesis (Al Jasem 2014). However, the protection and limited availability of these *Dioscorea* plants have caused increasing market prices of diosgenin. 

Collectively, the obstacles in the synthesis of steroids pushed researchers to look towards alternative production methods, namely through natural biosynthesis using microbial cell factories (Tong et al. 2009). Mild reaction conditions, lower chemical pollution, and higher conversion rates are among the advantages of using microbial cell factories in steroid production as compared to chemical synthesis (Tong et al. 2009). To our knowledge, no progesterone-producing cell factory has been published. Therefore, we decided to design one.

Understanding the biosynthetic pathway of progesterone is crucial to design a cell factory since it aids in choosing a suitable type of cell host for the cell factory. The biosynthesis of all steroids starts in the production of the triterpenoid precursor, squalene, in the mevalonate pathway (**Figure 2**). Through several enzymatic steps, squalene can then be cyclized to establish the foundation of all steroids including progesterone (Buhaescu et al. 2007). With this knowledge, it seemed reasonable to choose a cell host already skilled at producing squalene as the starting point.

![figures/simple_pathway.png](figures/simple_pathway.png)

**Figure 2.** Progesterone biosynthesis.


### 1.2 Literature review of the cell factory

When designing a cell factory, the first decisive decision to make is which cell to pick as the chassis. The cell host must be culturable. To integrate a heterologous pathway, the cell must be genetically engineerable as well. Additionally, it is a huge advantage if the cell host is well-known in the industry, has status as GRAS (generally regarded as safe), and naturally produces the compound of interest or a suitable precursor. Sometimes, some enzymes required in the heterologous pathway depend on certain features, such as eukaryotic organelles, which is the case for progesterone production, precluding prokaryotes. After considering various cell hosts – including the mammalian CHO-cells, microalgae, and non-conventional yeasts – our final choice ended at baker’s yeast, *Saccharomyces cerevisiae*.


*S. cerevisiae* is the most studied eukaryote that lives up to all the above-mentioned requirements except that it does not produce progesterone naturally (Parapouli et al. 2020). It does, however, produce squalene, which is used to produce the steroid, ergosterol (Xu et al. 2020). Extensive research of *S. cerevisiae* recently enabled researchers to generate a strain producing a staggering 21 g squalene pr. L, placing *S. cerevisiae* at the forefront of squalene-producing cell factories (Paramasivan et al. 2022; Zhu et al. 2021). Overproduction of the squalene is a good starting point for the overproduction of steroids as well, as more precursors will be available. 

Several issues challenge the production of steroids with *S. cerevisiae*. One issue is that many steroids are non-exportable and might cause toxic effects when accumulating in the cell. Using non-conventional yeasts, like *Yarrowia lipolytica* or *Pichia pastoris*, might aid in solving these issues. For example, *Y. lipolytica* also efficiently produces steroid precursors and is known for its ability to accumulate lipids and lipophilic compounds, and therefore maybe also steroids (Xu et al. 2020; Worland et al. 2020; Adrio 2017). *P. pastoris* holds an efficient secretion system, which has been suggested to allow extracellular steroid synthesis to overcome the toxic effect of steroid accumulation (Xu et al. 2020; Ahmad et al. 2014). The downside of these non-conventional strains is that they are much less studied compared to *S. cerevisiae*. As designing cell factories computationally heavily rely on representative and detailed models, we still decided to use *S. cerevisiae* as our model organism. Nonetheless, we believe that the findings for *S. cerevisiae* in this report will be applicable for other yeast strains.



## 2. Problem definition

Current production methods of steroid drugs rely on the extraction of precursors from plants combined with chemical synthesis, which causes a burden to the environment. Due to the increasing demand for steroid drugs, the development of efficient and sustainable production methods is a highly relevant and important topic.

**In this project, we aim to design a cell factory of *S. cerevisiae* that is optimized to produce the steroid drug, progesterone.**

First, we will analyze different genome-scale models (GSMs) of *S. cerevisiae* to use for our computer-aided analysis and design. We will identify and implement the necessary heterologous pathway into our model. This model will serve as the foundation to identify reaction targets for knockouts, up-regulation, and down-regulation, in order to improve the progesterone yield. Additionally, we perform a co-factor swapping analysis to test the effect on growth and progesterone productivity when swapping NAD(H) and NADP(H) in the identified reactions. To understand the effect on growth and progesterone productivity of our implemented alterations, we generate phenotypic phase plane plots, calculate maximum theoretical yields, and perform a dynamic flux balance analysis of a batch fermentation.
Lastly, we will assess 12 strains that we designed to conclude which of them that are most likely to be the best progesterone-producing cell factory.


## 3. Selection and assessment of existing GSM

Our choice of host organism, *S. cerevisiae*, is a very common and well-researched organism, thus multiple genome-scale metabolic models (GSMs) exist. GSMs can be used to computationally calculate predicted outcomes, which can then be verified experimentally. From BIGG and EMBL-EBI's BioModels we found four candidate GSMs: iFF708, iMM904, iND750, and yeast-GEM (version 8.6.2 newest on GitHub). iND750 is an improved version of the model iFF708, containing more genes, metabolites, and reactions (Duarte NC, Herrgård MJ, Palsson BØ. 2004).

Using `memote`, it is possible to assess different quality measures of the GSMs, including stochiometry, annotation, and reaction/metabolite statistics. Each model was individually tested using `memote` and the results of the `memote` runs can be seen in the `models/memote` folder. The results are summarised in the table below.

**Table 1** shows the `memote` results evaluating the four GSMs (iFF708, iMM904, iND750, and yeast8.6.2).

| Measure | iMM904 | yeast8.6.2 | iFF708 | iND750 |
| ---- | ---- | - | - | - |
| Total Metabolites | 1,226 | 2,744 | 796 | 1,059 |
| Total Reactions | 1,577 | 4,063 | 1,379 | 1.266 |
| Total Genes | 905 | 1,160 | 619 | 750 |
|  Stochiometric Consistency  | 100.0% | 0.0% | 0.0% | 100.0% |
|  Mass Balance  | 96.0% | 93.7% | 0.0% | 97.3% |
|  Charge Balance  | 98.5% | 98.2% | 100.0% | 100.0% |
|  Metabolite Connectivity  | 100.0% | 100.0% | 100.0% | 100.0% |
|  Unbounded Flux In Default Medium | 76.2% | 58.8% | 71.7% | 83.0% |
|  Metabolite Annotation | 80% | 68% | 25% | 80% |
|  Reaction Annotation | 82% | 65% | 25% | 83% |
|  Gene Annotation | 43% | 54% | 0% | 43% |
|  **Total score** | 85% | 68% | 19% | 86% |


As seen in **Table 1**, yeast8.6.2 contains by far the most metabolites, reactions, and genes. The difference is largest for the number of reactions and metabolites, of which it has approximately two times the number, compared to the other GSMs. iMM904, IFF708, and iND750 have a more similar number of reactions, with iMM904 having the most metabolites annotated. For the measure of stoichiometric consistency, the best models are iMM904 and iND750, which both have a consistency score of 100%. At last, iND750 has the highest total score (= 86%), with iMM904 coming just second (= 85%), followed by yeast8.6.2 (= 68%), and iFF708 (= 19%). 

We decided that the stochiometric consistency was a more important parameter than the number of reactions, as we expect the data generated using GSMs with a stochiometric consistency of 100% to be more reliable than using GSMs with a stochiometric consistency of 0%; In that way, we eliminated yeast8.6.2 and iFF708 as GSMs. This left us with either iMM904 or iND750 to choose from. Since the scores of these two GSMs are so close, we decided to go for iMM904, as it contains the most reactions, metabolites, and genes, which we expect will provide us with a better result that, again, will provide us with more reliable data when simulating using this model.

_[Notebook: GSM comparison](01_GSM_Comparison.ipynb)_

## 4. Computer-Aided Cell Factory Engineering

#### **Implementation and characterization of the cell factory**

**1. Implementation of heterologous pathway**

Yeast cells naturally produce the steroid ergosterol, which is produced in a long biosynthetic pathway from the precursor squalene (**Figure 3**). Progesterone can be produced from the intermediates zymosterol and 5-dehydroepisterol via a heterologous pathway (Jiang, Yi-qi, and Jian-ping Lin. 2022). Since the progesterone biosynthesis from 5-dehydroepisterol is not validated, we chose to implement the heterologous pathway starting from zymesterol as the precursor.

<!-- However, the biosynthesis of progesterone from 5-dehydroepisterol rely on an enzymatic reaction that to our knowledge is not validated. Therefore, the production of progesterone from zymosterol is the heterologous pathway we have implemented in our model (**Figure 3**). -->

![figures/pathway_med_strukturer_v3.png](figures/pathway_med_strukturer_v3.png)

**Figure 3.** Steroid biosynthesis. Natural ergosterol pathway is shown with a green box and the implemented progesterone heterologous pathway is shown with a blue box. The enzymes are represented by their gene name where endogenous yeast genes are represented in black and heterologous genes are represented in red. Arrows indicate the direction of reaction. Co-enzymes and co-substrates are shown in light grey.

We investigated other potential progesterone production pathways using the `pathway_prediction` algorithm from `cameo` (Cardoso, Joao GR, et al. 2018). In all the pathways suggested by `cameo`, zymosterol is converted into progesterone in six steps but with different paths.
Interestingly, `cameo` found another reaction (MNXR4011) between cholesterol and pregnenolone where only one NADP(H), instead of six in the manually curated pathway (CYP11A1), is needed. Therefore, this reaction was implemented instead.

<!-- All `cameo` pathways agree with the implemented heterologous pathway in the way that zymosterol is in four steps converted into cholesterol which is afterwards converted by two steps to progesterone.  -->

_[Notebook: Heterologous pathway implementation](02_heterologous_pathway_implementation.ipynb)_

**2. Calculating the maximum theoretical yield and productivity on default and alternative carbon sources**

At an uptake rate of 10 mmol/(gDW\*h) glucose, the maximum theoretical growth rate for the strain is 0.288 /h (objective set at growth), the maximum theoretical productivity of progesterone was 0.167 mmol/(gDW\*h), and the maximum theoretical progesterone yield was 0.017 mmol progesterone/mmol glucose (objective set at progesterone production). 
When both biomass and progesterone was set as the objective, the values changed; maximum possible growth rate was 0.119 /h, the maximum progesterone productivity was 0.156 mmol/(gDW\*h), and the maximum progesterone yield was 0.016 mmol progesterone/mmol glucose.
<!-- When the objective was changed, to account for both maximum growth and maximum production of progesterone, the values changed; Maximum possible growth rate was 0.119 /h, the maximum progesterone productivity was 0.156 mmol/(gDW\*h), and the maximum progesterone yield was 0.016 mmol progesterone/mmol glucose. -->

By increasing the availability of glucose in the medium, the maximum theoretical productivity of progesterone only slightly changed, whereas the maximum theoretical progesterone yield was drastically reduced. 
As the yield is defined as product over substrate, the substrate concentration is increased but the productivity is the same. 
Thus, the yield will be significantly lowered. 
It can therefore can be concluded that solely increasing the glucose concentration is not a valid approach for increasing progesterone yield; 
This makes sense, as there are two limiting exchanges in the medium (glucose and $\textrm{O}_{2}$), and it is possible that both need to be changed to increase the yield of progesterone. 
Furthermore, using the alternative carbon sources, fructose and galactose, did not improve the yield.

<!-- It was also found that the alternative carbon sources, fructose and galactose, was utilized not better than the default media containing glucose.  -->

_[Notebook: Maximum theoretical yield](03_maximum_theoretical_yield.ipynb)_

**3. Phenotypic phase plane analysis using `cameo` and `cobrapy`**

To fully elucidate how the cells production capabilities behave in relation to changes in the medium and objective, we perform a phenotypic phase plane analysis using `cameo` and `cobrapy`. Since our medium only restricts uptake of oxygen and glucose, these are the parameters we specifically look at. Before analysing the response to changes in these conditions, we must first understand the trade-off between progesterone production in our cell factory and its growth.

<p float="left">
  <img src="figures/04_phenotypic_phase_plane_biomass_progesterone.jpg" width="50%" />
</p>

**Figure 4.** Phenotypic phase plane for progesterone flux (through the demand `DM_progesterone_c`) over biomass flux (cell growth).

**Figure 4** clearly depicts the trade-off between growth and production of progesterone. For the highest possible cell growth, it cannot prioritise the production of progesterone. Interestingly, the reverse isn't actually true. We see an almost constant plateau in progesterone production at a cell growth, $\mu\sim 0.1$. This seems to show that if we optimise for the production of progesterone and choose the maximum, the cell factory is still able to grow.

But which conditions create the highest productivity - and maybe more importantly the highest yield? By plotting the productivities and yield of progesterone and biomass in the ranges $-10 < \textrm{O}_2 < 0$ and $-20 < \textrm{Glc} < 0$, we explore the entire realistic space of values in search of the optimal.

<p float="left">
  <img src="figures/04_phenotypic_phase_plane_progesterone_productivity.jpg" width="40%" />
  <img src="figures/04_phenotypic_phase_plane_biomass_productivity.jpg" width="40%" />
</p>

**Figure 5(a, b).** Phenotypic phase plane for progesterone and biomass productivity/flux (where each is maximised) as a function of oxygen and glucose.

<br/>

<p float="left">
  <img src="figures/04_phenotypic_phase_plane_progesterone_yield.jpg" width="40%" />
  <img src="figures/04_phenotypic_phase_plane_biomass_yield.jpg" width="40%" />
</p>

**Figure 6(a, b).** Phenotypic phase plane for progesterone and biomass yield (where each is maximised) as a function of oxygen and glucose.

In **Figure 5a** and **5b**, we observe that an increased glucose in general increases the cell growth, but for progesterone productivity is quite quickly stagnates and remains constant as glucose increases; it seems to be much more dependent on $\textrm{O}_2$ flux. But while a higher oxygen flux increases productivity, higher levels of oxygen decrease productivity until it reaches 0 for very high levels of oxygen. This may be explained biologically by oxygen toxicity.

The difference in how glucose affects progesterone and biomass production can also be seen in the respective yield plots **Figure 6a** and **6b**. Progesterone yield decreases as glucose flux increases, while biomass yield approaches a constant.

For low values of glucose, varying the oxygen for either objective gives the same tendency: an optimal ridge with high yield. For the default oxygen level in our model `EX_o2_e = -2`, we find the ridge in the sectional plots **Figure 7a** and **7b**:

<p float="left">
  <img src="figures/04_phenotypic_phase_plane_progesterone_yield_optimum.jpg" width="35%" />
  <img src="figures/04_phenotypic_phase_plane_biomass_yield_optimum.jpg" width="35%" />
</p>

**Figure 7(a, b).** Phenotypic phase plane for progesterone and biomass yield (where each is maximised) as a function of glucose for oxygen flux at -2. At the maximum yield we find the conditions in **Table 2**, shown below.

*Table 2. Phenotypic phase plane analysis results.*

| Objective | Productivity | Yield | Glucose flux | Oxygen flux |
| - | - | - | - | - |
| Progesterone |	0.084 | 0.098 | -0.856 | -2.0 |
| Biomass |	0.070 | 0.082 | -0.856 | -2.0 |

Interestingly we observe the same optimal glucose flux for both.
The carbon yield in Cmole for progesterone is 0.344.

_[Notebook: Phenotypic phase plane analysis](04_phenotypic_phase_plane_analysis.ipynb)_

#### **Cell factory engineering strategies**

<!-- Cell factories can be engineered in various ways in order to make them stable and productive.  -->
<!-- Using computer-aided cell factory design, we investigated different cell factory engineering strategies for improving the progesterone producing *S. cerevisiae* strain (iMM904_progesterone). -->

**1. Gene targets for knock-outs**
<!-- **1. Knocking out ERG5 and ERG6** -->
To increase the flux towards progesterone production, we searched for gene targets for knock-outs using `OptGene` and searching literature.

`OptGene` is an evolutionary programming based tool to find knockout targets (Patil, Kiran Raosaheb, et al. 2005). Unfortunately, we did not identify any knockout targets using `OptGene`.

<!-- The heterologous pathway for production of progesterone starts from zymosterol which naturally is an important precursor for ergosterol (**Figure 3**).  -->
It is a common engineering strategy to knockout ERG5 and ERG6 (**Figure 3**) to improve the production of cholesterol and similar steroids that are produced from a precursor in the ergosterol pathway (Jiang, Yi-qi, and Jian-ping Lin. 2022; Xu, Shanhui, and Yanran Li. 2020). 
<!-- Therefore, we investigated the effect of knocking out these genes in our model optimized for growth and progesterone productivity.  -->
Surprisingly, knocking out ERG5 and ERG6 in our model had no effect on cell growth (µ = 0.119 /h) or progesterone productivity (0.156 mmol/(gDW*h)) when biomass and progesterone were set to be the objective. 
This might be because the flux through ERG5 and ERG6 in this optimized model is already 0, which in principle simulates that they are knock-outed (for further details, see _[Notebook: Gene target analysis](05_gene_target_analysis.ipynb)_).
<!-- if knocking out ERG5 and ERG6 in our model improves the progesterone production. -->

<!-- Surprisingly, our simulation showed that knocking out ERG5 and ERG6 had no effect on cell growth (µ = 0.119 /h) or progesterone productivity (0.156 mmol/(gDW*h)) when biomass and progesterone were set to be the objective.  -->

<!-- **Rune: Jeg tænker at det her bliver nødt til at blive skåret lidt ned. Måske man kunne undlade noget af varificeringen af knockouts? Så afsnit med HSD3B og 61 ændrede reactioner. Tænker måske heller ikke nødvendigvis at vi behøver at henvise til tablen her(?) Og så tror jeg også der skal cuttes lidt ned så vi måske går lidt hurtigere til konklusionerne (?) Måske kunne afsnit 1 og 2 kombineres i et (Knockout analysis). Det er lidt sjovt at have et afsnit med OptGene hvor der bare står at det ikke virker**

**Caro: har slettet noget af det, måske skal der slettes mere**

_[Notebook: Gene target analysis](05_gene_target_analysis.ipynb)_ -->

**2. Up- and downregulation targets using FSEOF**

Flux Scanning based on Enforced Objective Flux (FSEOF) analysis identifies gene targets for up- or down-regulation to increase the flux towards the compound of interest (Choi, Hyung Seok, et al. 2010).
By performing an FSEOF analysis, we identified 117 reactions with a large flux change when the flux was enforced towards progesterone.

<!-- is a tool for identification of gene amplification targets (Choi, Hyung Seok, et al. 2010).  -->
<!-- The flux is increased for a compound of interest (enforced objective) at the same time of maximizing biomass formation flux.  -->
<!-- The output is reactions of which the flux changes when increasing the flux towards the compound of interest making these reactions good targets for up- or downregulation.  -->
<!-- We performed the analysis on our model (iMM904_progesterone) setting the enforced objective to be progesterone.  -->
<!-- The flux of 117 reactions changed as a result of an increasing flux towards progesterone.  -->
<!-- Reactions which are a part of the heterologous pathway were removed because they are not relevant targets: It is obvious that their increase in flux follows the same as the enforced increased progesterone flux. Also, reactions which flux was close to 0 were not of interest. -->

**Figure 8** shows 20 reactions (including DM_progesterone_c reference) with the highest relative flux change, which are therefore promising targets for up- or down-regulation. 
The most up-regulated reactions are G3PD1ir, GLYCDy, G3PT, and DHAK with a flux increase of 4.58 mmol/(gDW\*h). 
These reactions form a cycle where NAD(+) and NADPH are formed. 
By increasing the flux through this cycle, the concentration of NAD(+) and NADPH are increased in the cell. 
Since NAD(+) and NADPH are used to produce progesterone (**Figure 3**), it makes sense that a higher concentration of these co-factors results in increased flux towards progesterone as well.

<!-- it makes sense that when these co-factors are increased then the flux through progesterone is increased as well. -->

![figures/05_flux_of_upregulated_genes.png](figures/05_flux_of_upregulated_genes.png)

**Figure 8.** The 20 reactions (including DM_progesterone_c reference) with the highest relative flux change when increasing progesterone flux. The x-axis shows an increasing progesterone flux over 10 steps. The progesterone flux is increased with 0.015 mmol/(gDW\*h) per step. 
In total, the progesterone flux is increased 0.135 mmol/(gDW\*h).

We investigated the influence of this reaction cycle (G3PD1ir, GLYCDy, G3PT, and DHAK) on the production of progesterone (**Figure 9**). Having the optimized reaction cycle in the model resulted in an 3.91% increase in maximum progesterone productivity when $\mu$ = 0.1187 compared to a model where no cycling happens. 
Thereby, the model suggests that up-regulation of the reaction cycle will result in higher progesterone production.

![figures/05_phase_plane.png](figures/05_phase_plane.png)

**Figure 9.** Phase plane plot of progesterone productivity (mmol/(gDW\*h)) and growth rate (/h). The blue line reflects the model when it has the optimized reaction cycle (G3PD1ir, GLYCDy, G3PT, and DHAK). The orange line reflects the model when the reaction cycle is turned off.

_[Notebook: Gene target analysis](05_gene_target_analysis.ipynb)_

**3. Co-factor swap targets**

The balance of co-factors within a cell is important to obtain a high theoretical yield of a given product (King, Zachary A., and Adam M. Feist, 2014). In our implemented pathway, the cell uses four NADPH and one NAD(+) to produce progesterone. 
Due to this extensive use of NADPH, we investigated if we could improve the co-factor balance by producing more NADPH on the cost of NADH. 

Using the algorithm `CofactorSwapOptimization`, we identified 20 reactions where swapping the co-factor NAD(H) with NADP(H) could potentially increase progesterone productivity. 
Of the 20 reactions, the GADP reaction from the glycolysis seemed to be the reaction with most potential to investigate further. 

The GADP reaction produces NADH from the following reaction:

Glyceraldehyde-3-phosphate + NAD(+) + Pi <=> 3-Phospho-D-glyceroyl-phosphate + H(+) + NADH

We exchanged this reaction with a similar one producing NADPH instead as described in _[Notebook: co-factor swap](06_cp-factor_swap.ipynb)_ and tested how it would affect the theoretical maximum progesterone and biomass productivity. 
The theoretical progesterone productivity did not increase, but the theoretical growth rate increased with 2%. 
While this might not seem impressive, it is more informative to plot the phase plane of progesterone productivity versus biomass productivity for the old model and the model with NAD(H) swapped with NADP(H) for GAPD (**Figure 10**). 
This plot reveals, that the co-factor swap allow for maximum progesterone production at much higher growth rates. For the initial model, the progesterone productivity decreases around a biomass productivity of 0.10. With the co-factor swap, the progesterone productivity decreases around a biomass productivity of 0.16. In other words, the biomass productivity increases by 60% when maximum progesterone productivity is priotized.

<!-- , where the progesterone productivity would have decreased from 0.167 to about 0.125 in the initial model. -->

<!-- _[Notebook: co-factor swap](06_cp-factor_swap.ipynb)_ -->


![figures/06_phase_plane.png](figures/06_phase_plane.png)

**Figure 10.** The phase plane of progesterone productivity (mmol/gDW*h) and growth rate (/h) of a model with and without NAD(H) swapped with NADP(H) in GAPD.

_[Notebook: Co-factor swap](06_co-factor_swap.ipynb)_

**4. Dynamic Flux Balance Analysis**

Calculating a single number for the progesterone yield and flux gives little information about what the final titres will be. 
We performed Dynamic Flux Based Analysis (DFBA) to estimate the final titres of biomass, progesterone, and the precursor squalene.
<!-- To get a better idea of how the progesterone and biomass titres change over time, w -->
<!-- it can be insightful to mimic real conditions by simulating a simple batch fermentation. This is the purpose of Dynamic Flux Based Analysis (DFBA). Using DFBA, we estimated the titre of progesterone and also the precursor squalene in order to compare our estimates with experimental results from literature. -->

Since our model with pathway 1 and co-factor swapping implemented seemed to be one of the promising strains, we used it to simulate an aerobic batch fermentation with a constant $\textrm{O}_{2}$ level of 2 mmol/L and an initial glucose concentration of 10 mmol/L.

The simulation was visualized as seen in **Figure 11**. The batch fermentation ran for 5.1 hours before all the glucose got consumed. The final progesterone titre reached 0.212 mmol/L (0.067 g/L) and the squalene titre reached 0.214 mmol/L (0.088 g/L). In our simulation, there is a linear relationship between the initial glucose added and the final progesterone titre (See _[Notebook: DFBA](07_DFBA.ipynb)_). This does not seem fully realistic, but it is still likely that adding more glucose to a certain level would result in a higher titre.

![figures/07_Bacth_fermentation_simul.png](figures/07_Bacth_fermentation_simul.png)

**Figure 11.** Simulation of a batch fermentation with DFBA. Initial glucose concentration was 10 mmol/L and a constant O2 level of 2 mmol/L. The final progesterone titre reached 0.21 mmol/L.

_[Notebook: Dynamic Flux Balance Analysis](07_DFBA.ipynb)_

#### **Promising cell factory designs**

**1. Metabolic pathway visualisations using `Escher`**

The computed fluxes were visualized using the online version of escher. 
Flux going through the central carbon metabolism (**Figure 12**) and the heterologous pathway (**Figure 13**) was visualized. The two color scales are identical.

![figures/Central_carbon_metabolism.png](figures/Central_carbon_metabolism.png)
**Figure 12.** The fluxes of the central carbon metabolism. Created using the online version of `escher` (https://escher.github.io/#/). *Red* represents the highest flux, *blue* is the lowest flux, and *purple* represents flux somewhere between the *red* and *blue* flux values. Lastly, *grey* represents no flux.

From **Figure 12**, it can be seen that one of the highest fluxes produces ethanol. This could be a result of overflow metabolism - also known as the Crabtree effect in yeast - where ethanol is produced in excess as cells utilize aerobic fermentation over respiration (Malina, Carl, et al. 2021).

![figures/Heterologous_pathway.png](figures/Heterologous_pathway.png)
**Figure 13.** Flux through the pathway producing progesterone. Created using the online version of escher (https://escher.github.io/#/). *Red* represents flux through the pathway, *grey* represents no flux.

The heterologous pathway producing progesterone can be seen in **Figure 13**; It appears that the pathway going through cholesta-8-en-3beta-ol is preferred over the pathway going through cholesta-7,24-dien-3beta-ol for the production of progesterone. However, this could be because the fluxes loaded are saved following one simulation, and if doing enough simulations, the flux would go through the other pathway.

**2. Strain assessment**


<!-- *Table 3. Optimized models results.*
| Model number | Max µ (/h) | Max progesterone yield (mmol/mmol) | Optimized µ (/h) | Optimized progesterone yield (mmol/mmol) | Progesterone yield at µ=0.18 (mmol/mmol) |
| -  | - | - | - | - | - |
| Model 1 | 0.2879 | **0.0167** | 0.1187 | **0.0156** | 0.0104 |
| Model 2 | 0.2879 | **0.0167** | 0.1113 | **0.0156** | 0.01 |
| Model 3 | **0.2937** | **0.0167** | 0.1771 | 0.0155 | **0.0152** |
| Model 4 | **0.2937** | **0.0167** | 0.1771 | 0.0155 | **0.0152** |
| Model 5 | 0.2879 | 0.0143 | 0.1313 | 0.0135 | 0.0096 |
| Model 6 | 0.2879 | 0.0143 | 0.1237 | 0.0135 | 0.0092 |
| Model 7 | **0.2937** | 0.0143 | **0.1919** | 0.0133 | 0.0139 |
| Model 8 | **0.2937** | 0.0143 | **0.1919** | 0.0133 | 0.0139 |
| Model 9 | 0.2879 | **0.0167** | 0.1187 | **0.0156** | 0.0104 |
| Model 10 | 0.2879 | **0.0167** | 0.1113 | **0.0156** | 0.01 |
| Model 11 | **0.2937** | **0.0167** | 0.1771 | 0.0155 | **0.0152** |
| Model 12 | **0.2937** | **0.0167** | 0.1771 | 0.0155 | **0.0152** | -->


12 different cell factory designs were assessed under standard medium conditions; glucose uptake = 10 mmol/(gDW\*h) and $\text{O}_2$ uptake = 2 mmol/(gDW\*h) (**Table 3** and **Figure 14**). 
The four best performing cell factory designs (model 3, 4, 11, and 12) perform equally well (**Figure 14**), however, due to a different number of modifications these models are scores differently leaving only model 3 in the top (**Table 3**). 
Interestingly, all six models containing the co-factor swapping are in the top-7 of best scored models. Therefore, this modification seems to be important to get a high performing model. By upregulating the NAD(+)/NADPH cycling as an additionally modification to co-factor swapping (model 4, 8, and 12 compared to model 3, 7, and 11) there are no observable change in performance (**Figure 14**). 
This suggests that the co-factor swapping takes over the role that NAD(+)/NADPH cycling has. Also, the upregulation of the NAD(+)/NADPH cycling only increase the performance slightly compared to models only with an implemented pathway (**Figure 14**), and taking the extra modification into account this upregulation does not even improve the score of the model (**Table 3**). 
Models with the manually derived pathway (model 5-8) follow the same trends that for the other pathways, but with a lower maximum progesterone productivity.

**Table 3**. Quantitative strain assessment of 12 different cell factory designs.

|Model number |Features and modifications |Number of modifications |Max µ (/h) |Max progesterone yield (mmol/mmol) |Optimized µ (/h) |Optimized progesterone yield (mmol/mmol) |Progesterone yield at µ=0.18 (mmol/mmol) |Score |
| -  | - | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
| Model 3 | Pathway 1, co-swap | 8 | **0.2937** | **0.0167** | 0.1771 | 0.0155 | **0.0152** | **90.2%** |
| Model 4 | Pathway 1, up NAD/NADPH cycling, co-swap | 12 | **0.2937** | **0.0167** | 0.1771 | 0.0155 | **0.0152** | 86.0% |
| Model 11 | Combined pathway, co-swap | 12 | **0.2937** | **0.0167** | 0.1771 | 0.0155 | **0.0152** | 86.0% |
| Model 7 | Manuel pathway, co-swap | 8 | **0.2937** | 0.0143 | **0.1919** | 0.0133 | 0.0139 | 85.4% |
| Model 1 | Pathway 1 | **4** | 0.2879 | **0.0167** | 0.1113 | **0.0156** | 0.01 | 82.8% |
| Model 12 | Combined pathway, up NAD/NADPH cycling, co-swap | 16 | **0.2937** | **0.0167** | 0.1771 | 0.0155 | **0.0152** | 81.9% |
| Model 8 | Manuel pathway, up NAD/NADPH cycling, co-swap | 12 | **0.2937** | 0.0143 | **0.1919** | 0.0133 | 0.0139 | 81.2% |
| Model 2 | Pathway 1, up NAD/NADPH cycling | 8 | 0.2879 | **0.0167** | 0.1187 | **0.0156** | 0.0104 | 79.7% |
| Model 9 | Combined pathway | 8 | 0.2879 | **0.0167** | 0.1113 | **0.0156** | 0.01 | 78.6% |
| Model 5 | Manuel pathway | **4** | 0.2879 | 0.0143 | 0.1237 | 0.0135 | 0.0092 | 78.3% |
| Model 10 | Combined pathway, up NAD/NADPH cycling | 12 | 0.2879 | **0.0167** | 0.1187 | **0.0156** | 0.0104 | 75.5% |
| Model 6 | Manuel pathway, up NAD/NADPH cycling | 8 | 0.2879 | 0.0143 | 0.1313 | 0.0135 | 0.0096 | 75.3% |


![figures/08_strain_assessment_phase_plan_plot.png](figures/08_strain_assessment_phase_plan_plot.png)

**Figure 14.** Phenotypic phase plane of progesterone productivity (mmol/gDW\*h) and growth rate (/h) results of 12 different cell factory desings.



_[Notebook: Strain assessment](08_strain_assessment.ipynb)_

## 5. Discussion

We successfully designed and evaluated 12 progesterone-producing *S. cerevisiae* cell factories. Our four best performing strains reached low progesterone and squalene titres compared to what has been achieved experimentally in literature (Paramasivan et al. 2022), especially compared to the record of 21 g/L (Zhu et al. 2021). However, it is difficult to compare our result to the record, as this sky-high titer was achieved by more advanced compartmentalization engineering strategies followed by a two-staged fed-batch fermentation (Zhu et al. 2021). Simulating these strategies was unfortunately beyond the scope of this project.

The features, that are likely to have the biggest effect on progesterone productivity, are the choice of pathway and the co-factor balance of NADP(H) and NAD(H) (see **Figure 14**). Choosing pathway 1 increases the theoretical maximal progesterone productivity and improving the co-factor balance of NADP(H) and NAD(H) increases the growth rate when the maximum progesterone productivity is prioritized. All of the investigated modifications relate to the availability of NADP(H) in the cell, which in conclusion must be very important for growth and progesterone productivity.

Other than ensuring availability of NADP(H), we need optimal substrate levels in our growth medium - particularly oxygen and glucose - for the cell factory to perform well. With our phase plane analysis, it seems that we need higher levels of oxygen than that of glucose for optimal progesterone yield. The yield of model 1 in Cmole/Cmol is 0.344, indicating that around a third of the input carbons are used to produce our product. Theoretically, this could be increased with higher glucose and especially oxygen flux. We do have to be careful with that, though, as increasing the oxygen by a lot is probably unrealistic, owing to oxygen toxicity, etc.

The success of implementing this heterologous pathway and modifications in real life depends, first of all, on whether the enzymes we have found will work efficiently in yeast, as we assume they will in this model. Other assumptions used in these simulations do not necessarily represent reality. For example, we assume that only one substrate, glucose, is limiting growth and that the fluxes of all metabolites are constant in steady state. Also, the degree of detail about the cell in the model is limited. For example, we do not model the effect of the accumulation of progesterone and contingent intermediates, which might be toxic to the cell and inhibit growth (Csáky et al. 2020; Xu et al. 2020). 

For the abovementioned reasons, the tools used in this report mainly aid in finding a suitable heterologous pathway and gene targets for knock-outs, knock-downs, and up-regulation. The calculated yields and titres might aid in assessing the theoretical impact of the implemented modifications, but the numbers themselves should not be regarded as conclusive. To get better estimates of productivity, yields and titres, it is possible to advance the model by for example including enzyme kinetics (Domenzain et al. 2021). What might be even more realistic is to start experimenting in the lab, starting by introducing the heterologous pathway and thereafter attempt to improve the growth and productivity by engineering the gene targets found in this report.






<!-- We successfully designed and evaluated 12 progesterone-producing *S. cerevisiae* cell factories where four of them turned out to be equally performing in our simulations. The features, that are likely to have the biggest effect on progesterone productivity, are the choice of pathway and the co-factor balance of NADP(H) and NAD(H) (see **Figure X**). Choosing pathway 1 seems to increase the theoretical maximal progesterone productivity from **XX** to **XX**. The main difference of this pathway from the others is that it requires six times less NADPH in the second last reaction. Changing the co-factor balance of NADP(H) and NAD(H) increases the growth rate when the maximum progesterone productivity is prioritized. Also, the FSEOF analysis revealed that the reactions with the biggest flux increase when optimizing for progesterone productivity are connected in a cycle where NAD and NADPH is produced. Thereby, the upregulation of these genes leads to a slight improvement in growth and progesterone productivity due to increased availability of NADPH and NAD. Notably, all these investigated modifications relate to the availability of NADP(H) in the cell, which in conclusion must be very important for growth and progesterone productivity. (188 words) **FIND ARTICLE ABOUT THE IMPORTANCE OF NADPH.**

- could have used OptKnock -->

<!-- We successfully designed and evaluated 12 progesterone-producing *S. cerevisiae* cell factories where four of them turned out to be equally performing in our simulations. The features, that are likely to have the biggest effect on progesterone productivity, are the choice of pathway and the co-factor balance of NADP(H) and NAD(H) (see **Figure X**). Choosing pathway 1 seems to increase the theoretical maximal progesterone productivity from **XX** to **XX**. The main difference of this pathway from the others is that it requires six times less NADPH in the second last reaction. Changing the co-factor balance of NADP(H) and NAD(H) increases the growth rate when the maximum progesterone productivity is prioritized. Also, the FSEOF analysis revealed that the reactions with the biggest flux increase when optimizing for progesterone productivity are connected in a cycle where NAD and NADPH is produced. Thereby, the upregulation of these genes leads to a slight improvement in growth and progesterone productivity due to increased availability of NADPH and NAD. Notably, all these investigated modifications relate to the availability of NADP(H) in the cell, which in conclusion must be very important for growth and progesterone productivity.  **FIND ARTICLE ABOUT THE IMPORTANCE OF NADPH.** -->

<!-- Other than ensuring availability of NADP(H), we need optimal substrate levels in our growth medium - particularly oxygen and glucose - for the cell factory to perform well. With our phase plane analysis, it seems that we need higher levels of oxygen than that of glucose for optimal progesterone yield. The yield of model 1 in Cmole/Cmol is 0.344, indicating that around a third of the input carbons are used to produce our product. Theoretically, this could be increased with higher glucose and especially oxygen flux. We do have to be careful with that, though, as increasing the oxygen by a lot is probably unrealistic, owing to oxygen toxicity, etc. -->

<!-- The success of implementing this heterologous pathway and modifications in real life depends, first of all, on whether the enzymes we have found will work efficiently in yeast, as we assume they will in this model. Other assumptions used in these simulations do not necessarily represent reality. For example, we assume that only one substrate, glucose, is limiting growth and that the fluxes of all metabolites are constant in steady state. Also, the degree of details about the cell in the model is limited. For example, we do not model the effect of the accumulation of progesterone and contingent intermediates, which might be toxic to the cell and inhibit growth (**REF**).  -->


<!-- For the abovementioned reasons, the tools used in this report mainly aid in finding a suitable heterologous pathway and gene targets for knock-outs, knock-downs, and up-regulation. The calculated yields and titres might aid in assessing the theoretical impact of the implemented modifications, but the numbers themselves should not be regarded as conclusive. To get better estimates of productivity, yields and titres, it is possible to advance the model by for example including enzyme kinetics or **XXX** data (**REF**). What might be even more realistic is to start experimenting in the lab, starting by introducing the heterologous pathway and thereafter attempt to improve the growth and productivity by engineering the gene targets found in this report **Er det her for flabet skrevet hehe**. -->


## 6. Conclusion

We computationally generated 12 progesterone-producing *S. cerevisiae* cell factories. 
Using phenotypic simulations, our best-performing strains reached a progesterone productivity of 0.167 mmol/(gDW\*h). Additionally, the progesterone titre was estimated to 0.212 mmol/L in a batch fermentation simulation with 10 mmol/L glucose initially and a constant $\textrm{O}_{2}$ uptake of 2 mmol/(gDW\*h).

Our work indicates that it is possible to produce the heterologous steroid progesterone using *S. cerevisiae* as host and that the production can be optimized computationally, to find the best strategies for maximizing the yield obtained. 
Such a production would be more sustainable than the current production and, thus, contribute to several of the UN sustainable development goals (SDG); These goals describe the steps necessary to ensure a sustainable future for all.  Our work is related to SDGs 3 (by promoting health), 9 (by being innovative), and 12 (by ensuring responsible production), and if successful, will provide a better alternative for steroid production and contribute to a sustainable tomorrow.

![figures/SDGs.png](figures/SDGs.png)
**Figure 15.** Our project contributes to SDG3 - Good health and well-being (by promoting good health), SDG9 - Industry, innovation and infrastructure (by being innovative), and SDG12 - Responsible consumption and production (by ensuring sustainable production).

## References

Al Jasem, Yosef, et al. "Preparation of steroidal hormones with an emphasis on transformations of phytosterols and cholesterol-a review." Mediterranean Journal of Chemistry 3.2 (2014): 796-830.

Bachmann, W. E., Wayne Cole, and A. L. Wilds. "The total synthesis of the sex hormone equilenin and its stereoisomers." Journal of the American Chemical Society 62.4 (1940): 824-839.

Bartlett, Paul D., Frank Henry Westheimer, and G. Büchi. "Robert Burns Woodward, Nobel Prize in Chemistry for 1965." Science 150.3696 (1965): 585-587.

Batth, Rituraj, et al. "Biosynthesis and industrial production of androsteroids." Plants 9.9 (2020): 1144.

Buhaescu, Irina, and Hassane Izzedine. "Mevalonate pathway: a review of clinical and therapeutical implications." Clinical biochemistry 40.9-10 (2007): 575-584.

Cardoso, Joao GR, et al. "Cameo: a Python library for computer aided metabolic engineering and optimization of cell factories." ACS synthetic biology 7.4 (2018): 1163-1166.

Choi, Hyung Seok, et al. "In silico identification of gene amplification targets for improvement of lycopene production." Applied and environmental microbiology 76.10 (2010): 3097-3105. 

Csáky, Zsófia, et al. "Squalene lipotoxicity in a lipid droplet‐less yeast mutant is linked to plasma membrane dysfunction." Yeast 37.1 (2020): 45-62.

Domenzain, Iván, et al. "Reconstruction of a catalogue of genome-scale metabolic models with enzymatic constraints using GECKO 2.0." BioRxiv (2021).

Dong, Jingzhou, et al. "Direct biotransformation of dioscin into diosgenin in rhizome of Dioscorea zingiberensis by Penicillium dioscin." Indian journal of microbiology 55.2 (2015): 200-206.

Duarte NC, Herrgård MJ, Palsson BØ. Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome Res. 2004 Jul;14(7):1298-309. doi: 10.1101/gr.2250904. Epub 2004 Jun 14. PMID: 15197165; PMCID: PMC442145.

Howie, Peter W. "The progestogen-only pill." British journal of obstetrics and gynaecology 92.10 (1985): 1001-1002.

Jesus, Mafalda, et al. "Diosgenin: recent highlights on pharmacology and analytical methodology." Journal of analytical methods in chemistry 2016 (2016).

Jiang, Yi-qi, and Jian-ping Lin. "Recent progress in strategies for steroid production in yeasts." World Journal of Microbiology and Biotechnology 38.6 (2022): 1-14.

Jordá, Tania, and Sergi Puig. "Regulation of ergosterol biosynthesis in Saccharomyces cerevisiae." Genes 11.7 (2020): 795.

Malina, Carl, et al. "Adaptations in metabolism and protein translation give rise to the Crabtree effect in yeast." Proceedings of the National Academy of Sciences of the United States of America (2021): vol. 118,51.

Nath, Anita, and Regine Sitruk-Ware. "Progesterone vaginal ring for contraceptive use during lactation." Contraception 82.5 (2010): 428-434.

Paramasivan, Kalaivani, and Sarma Mutturi. "Recent advances in the microbial production of squalene." World Journal of Microbiology and Biotechnology 38.5 (2022): 1-21.

Parapouli, Maria, et al. "Saccharomyces cerevisiae and its industrial applications." AIMS microbiology 6.1 (2020): 1.

Patil, Kiran Raosaheb, et al. "Evolutionary programming as a platform for in silico metabolic engineering." BMC bioinformatics 6.1 (2005): 1-12.

Slater, Leo B. "Industry and academy: The synthesis of steroids." Historical studies in the physical and biological sciences 30.2 (2000): 443-480.

Tong, Wang-Yu, and Xiang Dong. "Microbial biotransformation: recent developments on steroid drugs." Recent patents on biotechnology 3.2 (2009): 141-153.

Woodward, R. B., et al. "The total synthesis of steroids1." Journal of the American Chemical Society 74.17 (1952): 4223-4251.

Xu, Shanhui, and Yanran Li. "Yeast as a promising heterologous host for steroid bioproduction." Journal of Industrial Microbiology & Biotechnology: Official Journal of the Society for Industrial Microbiology and Biotechnology 47.9-10 (2020): 829-843.

Zhu, Zhan-Tao, et al. "Metabolic compartmentalization in yeast mitochondria: Burden and solution for squalene overproduction." Metabolic engineering 68 (2021): 232-245.