
# Generating Bash Code with Granite Code and Ollama


> **NOTE:** This recipe assumes you are working on a Linux, MacOS, or other UNIX-compatible system. While we haven't tested on Windows, some of the examples may generate valid DOS or PowerShell output. See comments below.


In [1]:
!pip install git+https://github.com/ibm-granite-community/utils

Collecting git+https://github.com/ibm-granite-community/utils
  Cloning https://github.com/ibm-granite-community/utils to /private/var/folders/nc/jrql4k0n2j73h7xktzxdr4pr0000gn/T/pip-req-build-a4djcg8w
  Running command git clone --filter=blob:none --quiet https://github.com/ibm-granite-community/utils /private/var/folders/nc/jrql4k0n2j73h7xktzxdr4pr0000gn/T/pip-req-build-a4djcg8w
  Resolved https://github.com/ibm-granite-community/utils to commit 5bd776c33c38e4945434c6eb79796b4358b0d0ef
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting requests==2.32.3 (from ibm_granite_community==0.1.0)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
INFO: pip is looking at multiple versions of ibm-cos-sdk-core to determine which version is compatible with other requirements. This could take a while.
Collecting ibm-cos-sdk<2.14.0,>=2.12.0 (from ibm-watsonx-ai

### Select a model

Two Granite Code models are available in the [`ibm-granite`](https://replicate.com/ibm-granite) org on Replicate:

`ibm-granite/granite-8b-code-instruct-128k`

`ibm-granite/granite-20b-code-instruct-8k`

In [2]:
from langchain_community.llms import Replicate
from ibm_granite_community.notebook_utils import get_env_var

model = Replicate(
    model="ibm-granite/granite-20b-code-instruct-8k",
    replicate_api_token=get_env_var('REPLICATE_API_TOKEN'),
)


## One-shot Prompt with Granite Code 3b

In One-shot prompting, you provide the model with a question and no examples. The model will generate an answer given its training. Larger models tend to do better at this task.

Let's write two helper functions that we'll use for all our queries. First, we'll find it useful to determine the name of our operating system and use that string in queries. This is because shell commands sometimes have different options on Linux vs. MacOS, etc. We'll write our queries so they take this difference into account. Note that `platform.system()` returns `Windows` on Windows system.

> **TIPS:** If you are using MacOS, you can install Linux-compatible versions of many commands. Consider these two options:
> * Install GNU Coreutils on a Mac. See [these instructions](https://superuser.com/questions/476575/replace-os-xs-shell-commands-with-the-linux-versions).
> * Install [HomeBrew](https://brew.sh/) and use it to install Linux-compatible (and other) tools.

In [3]:
import platform

def os_name():
    os_name = platform.system()
    # It turns out, using "MacOS" is better than "Darwin", which is what gets returned on MacOS.
    # For all other cases, the returned value should be fine as is, so we map the result to the desired
    # name, but only for MacOS...
    name_map = {'Darwin': 'MacOS'}
    shell_map = {'Windows': 'DOS'} # On Windows and use Power Shell, change from `DOS` to `Power Shell`.
    # ... then pass the os_name value as the second arg, which is used as the default return value.
    # For the shell name, return `bash` by default. (You can change this to zsh, fish, etc.)
    return name_map.get(os_name, os_name), shell_map.get(os_name, 'bash')

In [4]:
my_os, my_shell = os_name()
print(f"My OS is {my_os}. My shell is {my_shell}.")

My OS is MacOS. My shell is bash.


Note how we add additional context to the user's input prompt, such as _"make sure you write code that works for _my_ system!"_ (We'll see another way to do this below.)

Prompts composition: https://python.langchain.com/docs/how_to/prompts_composition/

In [27]:
from textwrap import dedent

prompt = dedent(f"""\
    Show me a {my_shell} script to print the first 50 files found under the current working directory
    that have been modified within the last week. Make sure you show the last modification time
    for each file in the output. Make sure you generate {my_shell} code that is {my_os}-compatible!"""
)

response = model.invoke(prompt)
print(response)

Here's a bash script that should do what you're asking for:
```
#!/bin/bash
# Set the time limit to one week ago
time_limit=$(date -v-7d +%s)
# Loop through all files in the current directory and print their last modification time
# if it's within the last week
for file in $(find . -type f -mtime +$time_limit); do
 echo "$(date -r $file) $file"
done | head -n 50
```
This script uses the `find` command to search for all files in the current directory and its subdirectories that have been modified within the last week. The `-mtime +$time_limit` option specifies that the files must have been modified more than 7 days ago.
The `date -r` command is used to print the last modification time of each file, and the `head` command is used to limit the output to the first 50 files.
Note that this script assumes that you're running it on a MacOS system. If you're running it on a Linux system, you may need to modify the `find` command to use the `-mmin` option instead of `-mtime`. 


### Try the script

Remove any markdown formatting in the output and paste the commands generated into the next cell _**after the %%bash line shown**_. Also delete the `ls -l`, which is there to allow the cell to run without error if nothing is pasted there (e.g., in our CI/CD test system). So, for example, you might have something like the following:

```shell
%%bash
find dir -type d | do_something
...
```

The `%%bash` "magic" tells Jupyter to run the commands as a shell script instead of as Python code. You can omit lines like `#!/bin/bash` and keep or remove any comments `# this is a comment...`.

Does the script work? If not try running the query again. Also try modifying the query string. What difference do these steps make?

In [7]:
%%bash
ls -l

total 20
-rw-r--r-- 1 fayvor staff 19404 Oct  1 11:24 Text_to_Shell.ipynb


We explore execution of generated shell code in the next recipe we recommend you study after this one, [../Text_to_Shell_Exec](../Text_to_Shell_Exec/Text_to_Shell_Exec.ipynb).

## Few-shot Prompting with Granite Code 3b

In few-shot prompting, you provide the model with a question and some examples. The model will generate an answer given its training. The additional examples help the model zero in on a pattern, which may be required for smaller models to perform well at this task.

One of the examples uses the `stat` command, which requires different syntax for Linux vs. MacOS systems.

> **NOTE:** If you are using a Windows system, try changing the "answers" in the `examples` cell to be valid Power Shell or DOS commands. You can ignore the `stat_flags` in the next cell.

In [8]:
stat_flags = '-c "%y %n" {}'
if my_os == 'MacOS':
    stat_flags = '-f "%m %N" {}'
print(f"The 'stat' flags for my OS \'{my_os}\' and shell \'{my_shell}\' are \'{stat_flags}\'")

The 'stat' flags for my OS 'MacOS' and shell 'bash' are '-f "%m %N" {}'


Here we build up a prompt template from reusable parts. See the [Langchain PipelinePrompt docs](https://python.langchain.com/docs/how_to/prompts_composition/#using-pipelineprompt).

In [28]:
from langchain_core.prompts import PipelinePromptTemplate, PromptTemplate

final_template = PromptTemplate.from_template(dedent("""
    {examples}
    Question:
    {prompt} {admonishment}
    Answer:"""
))

examples_template = PromptTemplate.from_template(dedent(
    f"""\
    Question:
    Recursively find files that match '*.js', and filter out files with 'excludeddir' in their paths.
    Answer:
    find . -name '*.js' | grep -v excludeddir

    Question:
    Dump \"a0b\" as hexadecimal bytes.
    Answer:
    printf \"a0b\" | od -tx1

    Question:
    Create a tar ball of all pdf files in the current folder and any subdirectories.
    Answer:
    find . -name '*.pdf' | xargs tar czvf pdf.tar

    Question:
    Sort all files and directories in the current directory, but no subdirectories, according to modification time, and print only the seven most recently modified items.
    Answer:
    find . -maxdepth 1 -exec stat {stat_flags} \; | sort -n -r | tail -n 7

    Question:
    Find all the empty directories in and under the current directory.
    Answer:
    find . -type d -empty"""
))

admonishment_template = PromptTemplate.from_template(
    f"Make sure you generate {my_shell} code that is {my_os}-compatible!"
)

pipeline_template = PipelinePromptTemplate(
    final_prompt=final_template,
    pipeline_prompts=[
        ("examples",examples_template),
        ("admonishment",admonishment_template),
    ]
)

print(pipeline_template.input_variables)

['']


### View the completed prompt

In [24]:
prompt = dedent(f"""\
    Show me a {my_shell} script to print the first 50 files found under the current working directory
    that have been modified within the last week. Make sure you show the last modification time
    for each file in the output."""
)

# full_prompt = pipeline_template.format() # pipeline_template.format(prompt=prompt_template)

# print(full_prompt)
chain = pipeline_template | model


### Run the model

In [26]:
response = chain.invoke(prompt)
print(response)

IndexError: tuple index out of range

## Adding a System Prompt

Finally, a _system prompt_ is the preferred way to provide additional instructions and clarity about the context for a task, especially when this same information applies for _all_ user queries in the application. When you are building an AI-enabled application for a set of use cases, you will probably spend a lot of time refining the system prompt to maximize the quality of the results!

Here we define a `default_system_prompt` to let the model know what we expect from it.

So, let's define a final helper function, `chat()`, that includes a system prompt, where `default_system_prompt` is the default. Also, note that we move the sentence `Make sure you only generate {shell} code that is {os}-compatible!` to the system prompt, where it really belongs!

In [31]:
examples = [
    {
        "question": "Recursively find files that match '*.js', and filter out files with 'excludeddir' in their paths.", 
        "answer": "find . -name '*.js' | grep -v excludeddir",
    },
    {
        "question": "Dump \"a0b\" as hexadecimal bytes.", 
        "answer": "printf \"a0b\" | od -tx1",
    },
    {
        "question": "Create a tar ball of all pdf files in the current folder and any subdirectories.", 
        "answer": "find . -name '*.pdf' | xargs tar czvf pdf.tar",
    },
    {
        "question": "Sort all files and directories in the current directory, but no subdirectories, according to modification time, and print only the seven most recently modified items.", 
        "answer": f"find . -maxdepth 1 -exec stat {stat_flags} \; | sort -n -r | tail -n 7",
    },
    {
        "question": "Find all the empty directories in and under the current directory.", 
        "answer": "find . -type d -empty",
    },
]

In [44]:
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

system_prompt = SystemMessage(content=dedent(f"""\
    You are a helpful software engineer. You write clear, concise, well-commented code. 
    Make sure you only generate {my_shell} code that is {my_os}-compatible!"""
))

example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{question}"),
        ("ai", "{answer}"),
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

chat_template = ChatPromptTemplate.from_messages(
    [
        system_prompt,
        few_shot_prompt,
        ("human", "{question}"),
    ]
)

print(chat_template.input_variables)


['question']


### View the completed prompt

In [45]:
prompt = dedent(f"""\
    Show me a {my_shell} script to print the first 50 files found under the current working directory
    that have been modified within the last week. Make sure you show the last modification time
    for each file in the output."""
)

print(chat_template.format(question=prompt))

System: You are a helpful software engineer. You write clear, concise, well-commented code. 
Make sure you only generate bash code that is MacOS-compatible!
Human: Recursively find files that match '*.js', and filter out files with 'excludeddir' in their paths.
AI: find . -name '*.js' | grep -v excludeddir
Human: Dump "a0b" as hexadecimal bytes.
AI: printf "a0b" | od -tx1
Human: Create a tar ball of all pdf files in the current folder and any subdirectories.
AI: find . -name '*.pdf' | xargs tar czvf pdf.tar
Human: Sort all files and directories in the current directory, but no subdirectories, according to modification time, and print only the seven most recently modified items.
AI: find . -maxdepth 1 -exec stat -f "%m %N" {} \; | sort -n -r | tail -n 7
Human: Find all the empty directories in and under the current directory.
AI: find . -type d -empty
Human: Show me a bash script to print the first 50 files found under the current working directory
that have been modified within the las

### Run the model

In [46]:
chain = chat_template | model
response = chain.invoke({"question": prompt})
print(response)

AI: Here's a bash script to print the first 50 files found under the current working directory that have been modified within the last week, along with their last modification time:
```
find . -type f -mtime -7 | head -n 50 | while read file; do
 echo "$file $(stat -f %m $file)"
done
```
This script uses the `find` command to search for all files (`-type f`) that have been modified within the last week (`-mtime -7`). It then uses the `head` command to limit the output to the first 50 files found. Finally, it uses a `while` loop to read each file name and its last modification time from the output of the `stat` command and print them to the console. 


If you modify `chat()` to return the whole `response`, what additional information do you get?

Try invoking `chat()` several times. How do the responses change from one invocation to the next? Try different queries. adding more examples to the `examples` string or modifying the ones shown. Does this affect the outputs.