## Processing Examples

We will use the following structure to process examples:
1. Read the `examples` directory to get a list of all the example directories
2. For each `example`, do the following:
	1. Gather the list of issues in the `issues` directory
	2. For each issue:
		1. gather the relevant files from the `files` directory
		2. Generate a prompt based on the issue & files
		3. Try and retrieve a completion from OpenAI's completion endpoint:
			- If successful:
				1. Gather the updated files, write their contents to the original files in `files`
				2. Create a copy of the completed files in the `completions` directory 
			- Otherwise, stop processing the issues
	3. Write the last successfully processed issue to a `last-processed` file


## Reading File Data
We'll use an `examples` directory to describe our scenarios. Every scenario consists of a directory called `files` containing the initial files that we'll be referencing, and a `issues` directory consisting of subdirectories named by the issue numbers, e.g. `issues/1`, `issues/2`, etc. Under each subdirectory, there is an `issue.md` file describing the changes to be made, and a `completions` directory where the files will be written to after being updated.


For example:

```
last-processed.json
files/
	README.md
	index.js
	package.json
	package-lock.json
	generated-files/
		file-12345.yaml
		nfs-server-config.yaml
		nfs-pvc.yaml
issues/
	1/
		issue.md
		completions/
			index.js
			package.json
	2/
		issue.md
		completions/
			README.md
			package-lock.json
	3/
		issue.md
		completions/
			README.md
			index.js
			package.json
```



In [32]:
// declare our variables up here
let prompt, files, issue, i, examples;
const OPENAI_API_URL = 'https://api.openai.com/v1/engines/code-davinci-001/completions';


undefined

In [33]:
// import libraries
const path = require('path');
const fs = require('fs');

undefined

In [34]:
// list out the contents of the current directory
let localFiles = fs.readdirSync('./examples');

undefined

## Processing Issues

Issues provide us with critical information about how exactly to generate files.
There are two scenarios that could happen:

1. The issue references files by including the syntax `@filetag:/path/to/file`, at which point we assume that the issue wants a modification to the existing files, rather than an entirely new generation
2. No files are referenced, in which case we attempt to generate new files that match the issue's specification

In [35]:
// return a map of {fileName => {path: path, content: content}}
const getFilesFromIssue = (issue) => {
	// extract a map of the files from the issue based on the following regex:
	// /`(@([a-zA-Z0-9_\-]+):(.+))`/g
	const fileRegex = /`(@([a-zA-Z0-9_\-]+):(.+))`/g;
	// create a map from the filename to the filepath
	const fileMap = new Map();
	// extract the string from group 3 of the regex
	let match;
	while (match = fileRegex.exec(issue)) {
		let [name, path] = [match[2], match[3]];
		if (!fileMap.has(name)) {
			// define the file object here 
			fileMap.set(name, {
				path: path,
				content: '',
				updatedContent: '',
			});
		} else {
			console.error(`duplicate file name ${name}`);
		}
	}
	return fileMap;
}


undefined

In [36]:
let getFilenamesFromIssue;

undefined

### Populating Files

When the issue references files, we can obtain their filepath through regex, and subsequently search for the files in attempts to populate them via their `path` attribute.

In [37]:
const populateFiles = (fileMap, rootDir) => {
	// look through the fileMap and read the contents of the file at the given path
	for (const [_, file] of fileMap) {
		let searchPath = path.join(rootDir, file.path);
		file.content = fs.readFileSync(searchPath, 'utf8');
	}
	return fileMap;
}

undefined

In [38]:
let getIssuesForDirectory;

undefined

## Obtaining Issues From a Directory

With each directory we look at, we go through the contents of the `issues` directory, which contains statements regarding modifications that should be made to the files.

We then observe the value of `lastProcessed` in `last-processed.json`, and if it's not present, we start at the first issue.

Each issue is processed in the ordering of its number in the `issues` directory.


In [39]:
getIssuesForDirectory = (dir) => {
	/* return a list of issue objects of the form: 
		{
			relevantFiles: string[],
			content: string,
			issueNumber: number
		}
	*/ 
	let issues = fs.readdirSync(dir).map((issueNo) => {
		let issuePath = path.join(dir, issueNo);
		let issue = fs.readFileSync(path.join(issuePath, 'issue.md'), 'utf8');
		return {
			relevantFiles: getFilesFromIssue(issue),
			content: issue,
			// convert the issueNo to a number
			issueNumber: parseInt(issueNo)
		};
	});
	// sort the issues by issue number
	issues.sort((a, b) => a.issueNumber - b.issueNumber);
	return issues;
}

[Function: getIssuesForDirectory]

## Building the prompt

We build a new prompt for each issue depending on whether or not it has referenced any files.

When no files are referenced, we build an issue using the following structure:

```
# Preamble defining the document
1. Description of the issue
2. A list of new files that are created to address the issue
```

On the contrary, files being referenced calls for the following structure:
```
# Preamble defining the document
1. Description of the issue
2. The contents of the files being referenced, prefixed by their `@filetag`
3. The files after being updated to address the issue of #1, prefixed by their `@filetag`
```


In [40]:
let buildPrompt;

undefined

In [82]:
// @issue: string
// @files: map of {fileName: string => {path: string, content: string, updatedContent: string}}
buildPrompt = (issue, files) => {
	// read the prefix from 'prefix.md'
	let prefix, prompt;
	// if files is empty, we use the new file prefix
	if (files.size === 0) {
		prefix = fs.readFileSync('./new-prefix.md', 'utf8');
		prompt = `${prefix}

## 1. Description of issue:
${issue}

## 2. New files:\n`;
	} else {
		prefix = fs.readFileSync('./update-prefix.md', 'utf8');
		prompt = `${prefix}
	## 1. Description of issues:
	${issue}

	## 2. Original files:
	`;

		i = 0;
		for (const [fileName, file] of files) {
			prompt += `# @${fileName}\n${file.content}\n`;
			// only place the delimiting string if in-between files
			if (files.size > 1 && i < files.size - 1) {
				prompt += '---\n';
			}
			i++;
		}

		prompt += `
	## 3. Updated files:
	`;
	}
	return prompt;
}

[Function: buildPrompt]

# Updating Files

We retrieve a completion from OpenAI's completion endpoint and split the files up by a '---' delimiter,
then we'll match them to their corresponding files.

Let's define a few functions to help us with this. We'll bring in the `axios` package to make our HTTP requests.

In [42]:
let completionToFiles, getCompletion;
var axios = require('axios');


undefined

In [86]:
var yaml = require('js-yaml');

undefined

## Transforming the Completion From OpenAI Into Files

After OpenAI returns a completion for a given issue, we'll then need to parse the contents and transform it back into a useful format which can be mapped to the files.

At this point, there are two scenarios:
1. OpenAI returned a nice response (hooray!)
2. OpenAI has returned a bunch of junk

When OpenAI returns something nice, the format for YAMLs will be the following or existing files:
```yaml
# @bobfile 
kind: Human
metadata:
	name: bob
	age: 24
	namespace: bobville
```

For new files:
```yaml
kind: Human
metadata:
	name: bob
	age: 24
	namespace: bobville
```

But on junk responses, we don't know what we'll get. 
To circumvent this, we attempt to process the response by trying to parse it as YAML, and if we can't, we'll assume it's a string and just return it.
We check if it's junk by stripping out all whitespaces and seeing if the length is 0, if not then there are still contents & we should save them. This isn't foolproof, but it's a good first approximation.


In [117]:
// process the completion & place it into the files' updatedContent field
completionToFiles = (completion, files) => {
	let completions = completion.split('---');
	// go through the list of completions, extract the filename and set the updated content
	for (const cmpltn of completions) {
		// extract the file tag from the completion
		const fileTagRegex = /#\s*\@(.+)/g;
		const match = fileTagRegex.exec(cmpltn);
		if (match !== null) {
			const fileTag = match[1];
			if (files.has(fileTag)) {
				// find the line containing the fileTag and remove all lines up to and including the fileTag 
				const lines = cmpltn.split('\n');
				let i = 0;
				for (const line of lines) {
					i++;
					if (line.includes(fileTag)) {
						break;
					}
				}
				// remove the lines from the completion
				const newCompletion = lines.slice(i).join('\n');
				// set the updated content
				files.get(fileTag).updatedContent = newCompletion;
			}
		} else {
			// check if the file is empty by stripping all whitespace & seeing if any characters are left
			const stripped = cmpltn.replace(/\s/g, '');
			if (stripped.length === 0) {
				// skip this
				continue;
			}

			// create a new file & map it to the name found in .metadata.name
			const yamlResource = yaml.load(cmpltn);
			// try to retrieve .metadata.name, else just default to a random name
			// use a random integer

			let name = yamlResource.metadata.name || `file-${Math.floor(Math.random() * 10000000000)}`;
			// if the files map already has an object with this name, keep generating a new one
			while (files.has(name)) {
				name = `file-${Math.floor(Math.random() * 10000000000)}`;
			}
			files.set(name, {
				path: `generated-files/${name}.yaml`,
				content: cmpltn,
				updatedContent: cmpltn
			});
		}
	}
};


[Function: completionToFiles]

## Obtaining Completions

This is the easiest part of the process, we just send our prompt over to OpenAI's completions endpoint and await a successful response. 

In [114]:

// to create the completion
getCompletion = async (prompt, maxTokens, stopSequences) => {
	stopSequences = stopSequences || ['####',];
	const headers = {
		// get OPENAI_API_KEY from env
		"Authorization": `Bearer ${process.env.OPENAI_API_KEY}`,
		"Content-Type": "application/json",
	};
	// console.log("headers", headers);
	const body = {
		prompt: prompt,
		max_tokens: maxTokens | 512,
		stop: stopSequences,
		temperature: 0,
		top_p: 1,
		frequency_penalty: 0,
		presence_penalty: 0,
	};
	let completion;

	// request the openai api using axios
	await axios.post(OPENAI_API_URL, body, { headers }).then(async (response) => {
		// update the object with the competion result
		if (response.status == 200 && response.data.choices) {
			if (response.data.choices.length > 0) {
				completion = response.data.choices[0].text;
			} else {
				console.error("no completion found");
			}
		}
	});
	return completion;
};



[AsyncFunction: getCompletion]

In [45]:
const updateFilesFromCompletion = async (files, prompt, maxTokens) => {
	const completion = await getCompletion(prompt, maxTokens, ['####',]);
	completionToFiles(completion, files);
}

undefined

In [46]:
let writeFilesToCompletionsDir;

undefined

## Saving the Files From Completion

Once we have obtained a completion, we'll need to save the new contents into the `completions` for our current issue, and copy the results into the original `files` directory.

In [47]:
// filesMap: map of {fileName: string => {path: string, content: string, updatedContent: string}}
// basePath: string
writeFilesToCompletionsDir = (filesMap, basePath) => {
		// delete the completions directory if it exists
		if (fs.existsSync(path.join(basePath, 'completions'))) {
			fs.rmSync(path.join(basePath, 'completions'), { recursive: true });
		}

		// write the updated files into the completions directory using their same path as the original files
		for (const [_, file] of filesMap) {
			let outputPath = path.join(basePath, 'completions', file.path);
			fs.mkdirSync(path.dirname(outputPath), { recursive: true });
			// write the file and create parent directories, if needed
			fs.writeFileSync(outputPath, file.updatedContent);
		}
}

[Function: writeFilesToCompletionsDir]

In [48]:
let writeUpdatedContentToFiles;

undefined

In [108]:
writeUpdatedContentToFiles = (filesMap, basePath) => {
	// go through each file and write the updated content to the file
	for (const [filename, file] of filesMap) {
		let outputPath = path.join(basePath, 'files', file.path);
		// create the base directory if it doesn't exist
		fs.mkdirSync(path.dirname(outputPath), { recursive: true });
		fs.writeFileSync(outputPath, file.updatedContent, { recursive: true });
	}
}

[Function: writeUpdatedContentToFiles]

In [50]:
let populateFileMap;

undefined

In [62]:
populateFileMap = (filesMap, basePath) => {
	// populate the example's files from the main files
	for (const [_, file] of filesMap) {
		// read the file from the file.path and set the content
		let filePath = path.join(basePath, file.path);
		file.content = fs.readFileSync(filePath, 'utf8');
	}	
}

[Function: populateFileMap]

In [None]:
let set

Now we'll run the file updater

In [52]:
let processExample;

undefined

## Processing Examples

Now we just bring all of the steps from above together:
1. Read the `examples` directory to get a list of all the example directories
2. For each `example`, do the following:
	1. Gather the list of issues in the `issues` directory
	2. For each issue:
		1. gather the relevant files from the `files` directory
		2. Generate a prompt based on the issue & files
		3. Try and retrieve a completion from OpenAI's completion endpoint:
			- If successful:
				1. Gather the updated files, write their contents to the original files in `files`
				2. Create a copy of the completed files in the `completions` directory 
			- Otherwise, stop processing the issues
	3. Write the last successfully processed issue to a `last-processed` file

In [72]:
processExample = async (baseDir) => {
	// first check to see if a last-processed.json file exists
	let lastProcessedFile = path.join(baseDir, 'last-processed.json');
	let lastProcessed;
	if (fs.existsSync(lastProcessedFile)) {
		lastProcessed = JSON.parse(fs.readFileSync(lastProcessedFile, 'utf8'));
	} else {
		lastProcessed = {
			issueNumber: 0,
		};
	}

	// load the issues 
	let issues = getIssuesForDirectory(path.join(baseDir, 'issues'));
	
	// we need to process all issues whose number is greater than the last processed issue number
	let issuesToProcess = issues.filter((issue) => issue.issueNumber > lastProcessed.issueNumber);


	try {
		// process each issue until failure or completion
		for (let issue of issuesToProcess) {
			// clean out all of the issue's files 
			let issuePath = path.join(baseDir, 'issues', issue.issueNumber.toString());
			
			// delete everything recursively EXCEPT issue.md
			for (const file of fs.readdirSync(issuePath)) {
				if (file !== 'issue.md') {
					fs.rmSync(path.join(issuePath, file), { recursive: true });
				}
			}
			// split the prompt based on whether the prompt references files
			// populate the issue's files 
			populateFileMap(issue.relevantFiles, path.join(baseDir, 'files'));


			// generate a prompt from the issue's content and files
			// write the prompt to a file
			let initialPrompt = buildPrompt(issue.content, issue.relevantFiles);
			fs.writeFileSync(path.join(issuePath, 'prompt.md'), initialPrompt);

			// obtain a completion & write it to file
			let completion = await getCompletion(initialPrompt, 512, ['####','## End of document']);
			fs.writeFileSync(path.join(issuePath, 'completion.md'), [initialPrompt, completion].join(''), );

			// convert the completion to the issue's files
			completionToFiles(completion, issue.relevantFiles);			
			writeFilesToCompletionsDir(issue.relevantFiles, issuePath);
			writeUpdatedContentToFiles(issue.relevantFiles, baseDir);

			// update the last processed issue number
			lastProcessed.issueNumber = issue.issueNumber;
		}
	} catch(e) {
		console.log(e);
	} finally {
		// write the last processed issue number to a file
		fs.writeFileSync(lastProcessedFile, JSON.stringify(lastProcessed));
	}
};

[AsyncFunction: processExample]

In [54]:
let processExamples;

undefined

In [55]:
processExamples = async () => {
	const examplesDir = './examples';
	const examples = fs.readdirSync(examplesDir);
	for (const dirname of examples) {
		console.log('processing example: ', dirname);
		let exampleDir = path.join(examplesDir, dirname);
		// read the last issue processed from the example directory
		await processExample(exampleDir);
	}
}

[AsyncFunction: processExamples]

In [120]:
$$.async();
{
	rod = (function rod() {
		const chars = "|/-\\";
		let i=0;
		return function() {
				i= (i+1) % 4;
				// We need to use process.stdout.write since console.log automatically adds a \n to the end of lines
				process.stdout.write(` ${chars[i]}\r`);
		}
	})();
	setInterval(rod, 250);	
	processExamples();
}

processing example:  request-more-cpu
filePath: examples/request-more-cpu/files/cluster-scope/base/core/namespaces/training-model/kustomization.yaml
basePath: examples/request-more-cpu/files
file.path: cluster-scope/base/core/namespaces/training-model/kustomization.yaml
filePath: examples/request-more-cpu/files/cluster-scope/base/core/namespaces/training-model/resourcequota.yaml
basePath: examples/request-more-cpu/files
file.path: cluster-scope/base/core/namespaces/training-model/resourcequota.yaml
relevantFiles:  Map(2) {
  'kustomization' => {
    path: 'cluster-scope/base/core/namespaces/training-model/kustomization.yaml',
    content: 'apiVersion: kustomize.config.k8s.io/v1beta1\n' +
      'kind: Kustomization\n' +
      'resources:\n' +
      '    - namespace.yaml\n' +
      '    - resourcequota.yaml\n' +
      'components:\n' +
      '    - ../../../../components/project-admin-rolebindings/octo-training-model\n' +
      '    - ../../../../components/limitranges/default\n' +
     

undefined

 \