# Attempting a fix

### First, the right mindset

One thing that's different about contributing to a large codebase as opposed to a smaller one is that things will move slower.  We want to make sure that our changes are truly **contributing**, and not adding to a mess.  This means you will spend much more time reading and understanding code than actually writing code. 

How much more?  Well, I've been contributing to large codebases for years, and to contribute six lines of code in my first commit, it took around four hours of research before even making any local changes to the codebase.

### Getting setup

Ok, so even though we set up our code in the previous lesson one issue we'll have is that there are always contributions made to the langchain codebase.  To make sure we have the codebase in the correct state, let's **use jigsaw's** [this forked version](https://github.com/jigsawlabs-student/langchain_chat_model) of the codebase.  

To do so, you can place the following in your browser:

`https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/data-engineering-jigsaw/langchain_lab`


> Remember, you may have to paste this line in your browser twice if Docker was not running.

Then go to the `libs/community` library:
    
```bash
cd libs/community
```

And run: 
```bash
poetry install --with lint,typing,test,test_integration
pip3 install rapidfuzz
```

Then run `make test` to confirm that you are properly set up.

### Reviewing our Issue

So in this lesson, we can begin a specific task.  Remember that our issue is to standardize the initializing arguments with different models and libraries that interact with langchain.  

We can see the [issue here](https://github.com/langchain-ai/langchain/issues/20085).  

And we can also see that a number of pull requests related to the issue have already been merged.

<img src="./merge-commits.png" width="60%">

So what's left?  And how can we figure this out?

#### Doing some research

Well if we click through the merged pull requests, we can see the kind of changes we should make.  For example, below are the files changed with respect to [baidu](https://github.com/langchain-ai/langchain/pull/20166/files), and [spark](https://github.com/langchain-ai/langchain/pull/20194/files), [mistral](https://github.com/langchain-ai/langchain/pull/20163/files) and [anthropic](https://github.com/langchain-ai/langchain/pull/20161/files).

<img src="./chat-model-change.png" width="22%"> <img src="./spark-llm.png" width="22%"> <img src="./mistral-change.png" width="22%"> <img src="./anthropic.png" width="22%"> 

From there, we can start to write down where to look for similar models to change:

    * libs/community/langchain_community/chat_models
    * libs/partners/*
    
Really, it just looks like two main folders to examine.

There's the folder for `chat_models` and then there's another under `libs/partners/`.  (We should also pay attention to needing to update related documentation, as we can see that was part of the anthropic pull request).

Let's go to our local codebase and open them up.  Here's what we'll see under `chat_models` and `libs/partners`.

<img src="./chat_models_open.png" width="20%"> <img src="./libs_partners.png" width="10%">

So we can see multiple different libraries that can be worked on.  From there, we can try to identify a library that is appropriate to contribute to.

### Finding a task

Again, this is our first commit, so we want our first commit to be easy.

> **Protip**: It's difficult to know "what's easy" before you try to make the commit.  But just remember, if you get stuck and find yourself in circles, you can always **retreat**.  In other words, see if you can use what you learned to then try to make a more simple contribution.  For example, I originally tried to make a contribution to the huggingface chat model, but then found myself spending hours just trying to get set up with huggingface.  Eventually, I moved onto another library, `perplexity` but used what I learned with huggingface to make my contribution with vertex.

Ok, so the contribution we'll ultimately make is to standardize the init arguments for the perplexity library.  You can see the relevant code for perplexity [here](https://github.com/jigsawlabs-student/langchain_chat_model/blob/master/libs/community/langchain_community/chat_models/perplexity.py).

Why perplexity?  Well, as we'll see, perplexity has some of those `init arguments` that need to be updated.  And secondly, it looks like our changes can be relatively small.  We can see that by searching for various files that say the word `perplexity`.  

> In VSCode press `cmd + p` and then search `perplexity`.

<img src="./perplexity-search.png" width="80%">

As we can see, there only a few mentions of perplexity in the codebase.  So above, we really only see that on first glance, three files may need to changed.

#### Changing init arguments

The next thing to do is to identify potential init arguments that we can change.

If you look at the `ChatPerplexity` model, then you'll see the following:

```python
client: Any  #: :meta private:
model: str = "pplx-70b-online"
"""Model name."""
temperature: float = 0.7
"""What sampling temperature to use."""
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Holds any model parameters valid for `create` call not explicitly specified."""
pplx_api_key: Optional[str] = None
"""Base URL path for API requests, 
leave blank if not using a proxy or service emulator."""
request_timeout: Optional[Union[float, Tuple[float, float]]] = None
"""Timeout for requests to PerplexityChat completion API. Default is 600 seconds."""
max_retries: int = 6
"""Maximum number of retries to make when generating."""
streaming: bool = False
"""Whether to stream the results or not."""
max_tokens: Optional[int] = None
"""Maximum number of tokens to generate."""
```

These are all init arguments, in a model defined with the [Pydantic](https://docs.pydantic.dev/latest/) library.  And if we take another look at the ideal init arguments, we see that we can make our comparison.

```python
model: str  # model name
api_key: str  # api key
temperature: float  # temperature sampling
timeout: ...  # request timeout
max_tokens: int  # max tokens
stop_sequences: ...  # stop sequences
max_retries: int  # max num retries
```

Then just go one by one to see which changes need to be made.  
```python
# model: str  # model name -->
api_key: str  # api key
# temperature: float  # temperature sampling -->
timeout: ...  # request timeout
# max_tokens: int  # max tokens -->
# stop_sequences: ...  # stop sequences -->
# max_retries: int  # max num retries -->
```

Ok, so it looks like we should update api_key, which is currently `pplx_api_key` and `timeout`, which is currently `request_timeout`.  

And how do we actually, make those changes?  Well once again, we'll look to similar parts of the library to determine the fix.  In this case, you can look at the related [Baidu pull request](https://github.com/langchain-ai/langchain/pull/20163/files) or the [Mistral pull request](https://github.com/langchain-ai/langchain/pull/20163/files) to see how the codebase was updated.  

> And even if we did not have related pull requests, we should then look to similar files to try to identify a pattern to help us follow the codebase's conventions.

### Trying it out

The first step before making any changes is to run our related tests.  So open up two (and only two) different files on your screen.  

* `libs/community/tests/unit_tests/chat_models/test_perplexity.py`
* `libs/community/langchain_community/chat_models/perplexity.py`

Every other file (maybe except the Makefile) should be closed.

<img src="./test-perplexity.png" width="60%">

And then the next step is to see if we can get these tests to run.  If you look at the Makefile, you'll see how we can run just one specific test.

```yaml
test tests integration_tests:
	poetry run pytest $(TEST_FILE)
```

So run the following:

```bash
poetry run pytest tests/unit_tests/chat_models/test_perplexity.py
```

You should see that three tests are successfully run.

Now the next step is try to make a fix.  That means updating both the `test_perplexity.py` file and the `chat_models/perplexity.py` files.  So use the [Baidu pull request](https://github.com/langchain-ai/langchain/pull/20163/files) and the [Mistral pull request](https://github.com/langchain-ai/langchain/pull/20163/files) as examples, and try to make a change.  

Remember, the init arguments we want to update are the following:

```
api_key: str  # api key
timeout: ...  # request timeout
```

So take a break from this reading, come back in ten minutes, and then give it a shot.

### Summary

In this lesson, we moved towards identifying the task for us to fix.  We began by reviewing the related pull requests to the issue, to see the types of files that were changed.  And then we tried to identify specific parts of the codebase that could use similar contributions.  

When settling on the perplexity files, our first step was to **run the tests**.  In this case, it wasn't so difficult, but in other libraries it may require a lot of environmental set up.