# Example of generating QAs for IRS PDF
In this example, we will show you how to generate question-answers (QAs) from a pdf using Huggingface's models via `uniflow`'s [OpenAIJsonModelFlow](https://github.com/CambioML/uniflow/blob/main/uniflow/flow/model_flow.py#L125).

For this example, we're using a [IRS 2023](https://investors.nike.com/investors/news-events-and-reports/).

### Before running the code

You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation.

Next, you will need a valid [OpenAI API key](https://platform.openai.com/api-keys) to run the code. Once you have the key, set it as the environment variable `OPENAI_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main#api-keys)

Finally, we are storing the Nike 10K in the `data\raw_input` directory as "nike-10k-2023.pdf". You can download the file from [here](https://s1.q4cdn.com/806093406/files/doc_downloads/2023/414759-1-_5_Nike-NPS-Combo_Form-10-K_WR.pdf).

### Update system path

In [1]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

### Install helper packages

In [2]:
!{sys.executable} -m pip install langchain pandas pypdf



### Import Dependency

In [3]:
from dotenv import load_dotenv
import os
import pandas as pd
# from uniflow.flow.client import TransformClient
from uniflow.flow.config import TransformHuggingFaceConfig
from uniflow.op.model.model_config import HuggingfaceModelConfig
# from langchain.document_loaders import PyPDFLoader
from uniflow.op.prompt_schema import Context, GuidedPrompt

from uniflow.flow.config import ExtractPDFConfig
from uniflow.op.model.model_config import NougatModelConfig

# from uniflow.op.extract.split.markdown_header_splitter import MarkdownHeaderSplitter

from uniflow.op.extract.split.constants import MARKDOWN_HEADER_SPLITTER

from uniflow.pipeline import MultiFlowsPipeline
from uniflow.flow.config import PipelineConfig

load_dotenv()


  from .autonotebook import tqdm as notebook_tqdm


False

### Prepare the input data
First, we need to pre-process the PDF to get text chunks that we can feed into the model. We will use `PyPDFLoader` from langchain.

In [4]:
pdf_file = "IRS_2023.pdf"

##### Set current directory and input data directory.

In [5]:
dir_cur = os.getcwd()
input_file = os.path.join(f"{dir_cur}/data/raw_input/", pdf_file)

#### Load the pdf using Nougat

In [6]:
data = [
    {"filename": input_file},
]

In [7]:
extract_config = ExtractPDFConfig(
    model_config=NougatModelConfig(
        model_name = "0.1.0-small",
        batch_size = 1 # When batch_size>1, nougat will run on CUDA, otherwise it will run on CPU
    ),
    splitter=MARKDOWN_HEADER_SPLITTER,
)


#### Set up prompt

In [8]:
guided_prompt = GuidedPrompt(
    # instruction="Generate 3Q&A based on the context.",
    instruction="Given a piece of context, generate 3 different sets of Q&As if possible. Each set should contains context, question, and answer, which can be accessed by the usesr later.",
    examples=[
        Context(
            context="In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",
            question="Who published A Mathematical Theory of Communication in 1948?",
            answer="Claude E. Shannon.",
        ),
        Context(
            context="In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",
            question="When was A Mathematical Theory of Communication published?",
            answer="In 1948",
        ),
        Context(
            context="In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",
            question="What book was published by Claude E. Shannon in 1948?",
            answer="A Mathematical Theory of Communication",
        ),
])

In [8]:
# guided_prompt = GuidedPrompt(
#     instruction="Generate Q&A based on the context.",
#     examples=[
#         Context(
#             context="In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",
#             question="Who published A Mathematical Theory of Communication in 1948?",
#             answer="Claude E. Shannon.",
#         ),
# ])

In [8]:
# guided_prompt = GuidedPrompt(
#     instruction="Generate 1 question and the corresponding answers based on the context, following the JSON format which Question and Answewr as two necessary keys. Calling output['context'], output['question'] and output['answer'] should return the corresponding context, question, and answer.",
#     examples=[]
# )

### Use LLM to generate data

In this example, we will use the [OpenAIModelConfig](https://github.com/CambioML/uniflow/blob/main/uniflow/model/config.py#L17)'s default LLM to generate questions and answers.

Here, we pass in our `guided_prompt` to the `OpenAIConfig` to use our customized instructions and examples, instead of the `uniflow` default ones.

We also want to get the response in the `json` format instead of the `text` default, so we set the `response_format` to `json_object`.

In [9]:
transform_config = TransformHuggingFaceConfig(
    guided_prompt_template=guided_prompt,
    model_config=HuggingfaceModelConfig(),
)

In [10]:
p = MultiFlowsPipeline(PipelineConfig(
    extract_config=extract_config,
    transform_config=transform_config,
))
output = p.run(data)

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Loading checkpoint shards: 100%|██████████| 2/2 [01:49<00:00, 54.86s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

INFO: likely hallucinated title at the end of the page: ## Costs You Can Deduct or Capitalize Page 27


100%|██████████| 1/1 [08:26<00:00, 506.03s/it]
100%|██████████| 197/197 [54:21<00:00, 16.56s/it] 


In [13]:
# output

Now we call the `run` method on the `client` object to execute the question-answer generation operation on the data shown above.

### Process the output

Let's take a look of the generated output. We need to do a little postprocessing on the raw output.

In [15]:
print(len(output[0]))

197


In [18]:
# Extracting context, question, and answer into a DataFrame
contexts = []
questions = []
answers = []

In [19]:
for item in output[0]:
    for i in item.get('output', []):
        for response in i.get('response', []):
            parts = response.split('\n')
            response_dict = {}
            last_key = None

            for part in parts:
                if ":" in part:
                    # Split on the first colon, regardless of whether there's a space after it
                    key, value = part.split(":", 1)
                    key = key.strip()  
                    value = value.strip()  
                    response_dict[key] = value
                    last_key = key
                elif last_key is not None:
                    response_dict[last_key] += " " + part
            
            if any(key not in response_dict for key in ['context', 'question', 'answer']):
                # print("[WARNING] Missing context, question or answer in response, skipping:\n", response)
                continue
            if "Claude E. Shannon" in response_dict['answer']:
                # print("[WARNING] Used example context, skipping:\n", response_dict["context"])
                continue
            contexts.append(response_dict['context'])
            questions.append(response_dict['question'])
            answers.append(response_dict['answer'])

pd.set_option('display.max_colwidth', None)
pd.set_option('display.width', 1000)

print(len(contexts))
print(len(questions))
print(len(answers))

df = pd.DataFrame({
    'Context': contexts,
    'Question': questions,
    'Answer': answers
})

196
196
196


In [20]:
df

Unnamed: 0,Context,Question,Answer
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online. _Taxpayer Advocate Service_. If you have questions about your individual income tax return or need help with a problem involving the IRS, contact the Taxpayer Advocate Service (TAS). TAS is an independent office within the IRS that helps resolve problems that cannot be resolved through normal channels. Visit _IRS.gov/Advocacy_ for more information. _IRS Business Assistance Center_. For businesses, visit _IRS.gov/BusinessAssistantCenter_ for answers to common business tax questions and links to resources for small businesses. _IRS Small Business Self-Help_. For small businesses, visit _IRS.gov/SmallBizSelfHelp_ for answers to common small business tax questions and links to resources for small businesses. _IRS Large Business & Specialty Tax_. For large businesses and specialty taxes, visit _IRS.gov/LBSTax_ for answers to common large business and specialty tax questions and links to resources for large businesses and specialty taxes. _IRS International_. For international tax issues, visit _IRS.gov/International_ for answers to common international tax questions and links to resources for international tax issues. _IRS Publications_. For all IRS publications, visit _IRS.gov/Publications_ to view and download publications on various tax topics. _IRS Forms_. For all IRS forms, visit _IRS.gov/Forms_ to view and download forms on various tax topics. _IRS Instructions_. For all IRS instructions, visit _IRS.gov/Instructions_ to view and download instructions on various tax topics. _IRS Newsroom_. For news and updates from the IRS, visit _IRS.gov/Newsroom_. _IRS YouTube Channel_. For videos and tutorials from the IRS, visit _IRS.gov/YouTube_. _IRS Social Media_. Follow the IRS on social media at _Facebook_, _Twitter_, _LinkedIn_, and _Instagram_ for news, updates, and tips. _IRS Contact Information_. Call 800-829-4033 or visit _IRS.gov/ContactInformation_ for phone numbers, mailing addresses, and other ways to contact the IRS. _IRS Website_. Visit _IRS.gov_ for more information about taxes and the IRS. _IRS Security_. Protect yourself from scams and identity theft by keeping your personal and financial information secure. Use strong passwords, keep software up to date, use encryption when transmitting sensitive data, and shred confidential documents before throwing them away. Do not share personal or financial information over the phone, via email, or through text messages.",What book was published by Claude E. Shannon in 1948?,A Mathematical Theory of Communication
1,"## Future Developments For the latest information about developments related to Pub. 535, such as legislation enacted after it was published, go to _IRS.gov/Pub.535_.",Where can I find the latest information about developments related to Pub. 535?,Go to _IRS.gov/Pub.535_
2,"## What's New for 2022 The following items highlight some changes in the tax law for 2022. **Form 1098-k reporting transition period.** The transition period described in _Notice 2023-10_ delays the reporting of transactions in excess of 5600 to transactions that occur after calendar year 2022. The transition period is intended to facilitate an orderly transition for TPSO tax compliance, as well as individual payge compliance with income tax reporting. A participating payge, in the case of a third-party network transaction, is any person who accepts payment from a third-party settlement organization for a business transaction. **The COVID-19 related credit for qualified sick and family leave wages is limited to leave taken after March 31, 2020, and before October 1, 2021.** Generally, the credit for qualified sick and family leave wages, as enacted under the _Families_ First Coronavirus Response Act (FFCRA) and amended and extended by the COVID-related Tax Relief Act of 2020, for leave taken after March 31, 2020, and before April 1, 2021, and the credit for qualified sick and family leave wages under sections 3131, 3132, and 3133 of the Internal Revenue Code, as enacted under the American Rescue Plan Act of 2021 (the ARP), for leave taken after March 31, 2021, and before October 1, 2021, have expired. However, employers that pay qualified sick and family leave wages in 2022 for leave taken after March 31, 2020, and before October 1, 2021, are eligible to claim a credit for qualified sick and family leave wages in 2022. For more information, see _chapter 2_. **The COVID-19 related employee retention credit has expired.** The employee retention credit enacted under the Coronavirus Aid, Relief, and Economic Security (CARES) Act and amended and extended by the Taxplayer certainty and Disaster Tax Relief Act of 2020 was limited to qualified wages paid after March 12, 2020, and before July 1, 2021. The employee retention credit under section 3134 of the Internal Revenue Code, as enacted by the ARP and amended by the Infrastructure Investment and Jobs Act, was limited to wages paid after June 30, 2021, and before October 1, 2021, unless the employer was a recovery startup business. An employer that was a recovery startup business could also claim the employee retention credit for wages paid after September 30, 2021, and before January 1, 2022. For more information, see _chapter 2_. **Credit for COBRA premium assistance payments is limited to periods of coverage beginning on or after April 1, 2021, through periods of coverage beginning on or before September 30, 2021.** Section 9501 of the ARP provides for COBRA premium assistance in the form of a full reduction in the premium otherwise payable by certain individuals and their families who elect COBRA continuation coverage due to a loss of coverage as the result of a reduction in hours or an involuntary termination of employment (assistance eligible individuals). This COBRA premium assistance is available for periods of coverage beginning on or after April 1, 2021, through periods of coverage beginning on or before September 30, 2021. For more information, see _chapter 2_. **The CARES Act provided for a temporary increase in the contribution limit for traditional IRA contributions made by employees who were not covered by a retirement plan at work during 2020.** Under section 72(t)(2)(A) of the Internal Revenue Code, as enacted by the CARES Act, the maximum amount that may be contributed to a traditional IRA for the 2020 tax year was increased from $6,000 to $7,000 for individuals who did not participate in a workplace retirement plan during 2020. This provision applies only to contributions made to traditional IRAs, not to Roth IRAs. Contributions to traditional IRAs are generally deductible on an individual’s federal income tax return, while contributions to Roth IRAs are made with aftertax dollars but provide tax-free withdrawals in retirement.",What is the transition period for Form 1098-k reporting?,The transition period is described in Notice 2023-10 and it delays the reporting of transactions in excess of 5600 to transactions that occur after calendar year 2022.
3,## References * [1] * [2] The following reminders and other items may help you file your tax return. [MISSING_PAGE_POST],What book was published by Claude E. Shannon in 1948?,A Mathematical Theory of Communication
4,"## Capital Expenses You must capitalize, rather than deduct, some costs. These costs are a part of your investment in your business and are called 'capital expenses.' Capital expenses are considered assets in your business. In general, you capitalize three types of costs. * Business startup costs (see _Tip_ below). * Business assets. * Improvements. You can elect to deduct or amortize certain business startup costs. See chapters 7 and 8.",What book was published by Claude E. Shannon in 1948?,A Mathematical Theory of Communication
...,...,...,...
191,"### Reasonable period of time. A reasonable period of time depends on the facts and circumstances. Generally, actions that take place within the times specified in the following list will be treated as taking place within a reasonable period of time. 1. You give an advance within 30 days of the time the employee pays or incurs the expense. 2. Your employees adequately account for their expenses within 60 days after the expenses were paid or incurred. 3. Your employees return any excess reimbursement within 120 days after the expenses were paid or incurred. 4. You give a periodic statement (at least quarterly) to your employees that asks them to either return or adequately account for outstanding advances _and_ they comply within 120 days of the date of the statement. How to deduct You can claim a deduction for travel and non-entertainment-related meals expenses if you reinfunwise your employees for these expenses under an accountable plan. Generally, the amount you can deduct for non-entertainment-related meals subject to a 50% limit, discussed later. If you are a sole proprieter, or are filing as a single member limited liability company, deduct the travel reimbursement on line 24a and the deductible part of the non-entertainment-related meals reimbursement on line 24b of Schedule C (Form 1040). If you are filing an income tax return for a corporation, include the reimbursement on the _Other deductions_ line of Form 1120. If you are filling any other business income tax return, such as a partnership or S corporation return, deduct the reimbursement on the appropriate line of the return as provided in the instructions for that return. To qualify for this deduction, you must have an accountable plan in place. An accountable plan is a written policy that requires employees to submit receipts for all meal expenses and to report the expenses to you. The policy must also require employees to repay any amounts not substantiated by receipts. What is an accountable plan? An accountable plan is a written policy that requires employees to submit receipts for all meal expenses and to report the expenses to you. The policy must also require employees to repay any amounts not substantiated by receipts.",What book was published by Claude E. Shannon in 1948?,A Mathematical Theory of Communication
192,"## Per Diem and Car Allowances You can reimburse your employees under an accountable plan based on travel days, miles, or some other fixed allowance. In these cases, your employee is considered to have accounted to you for the amount of the expense that doesn't exceed the rates established by the federal government. Your employee must actually substantiate to you the other elements of the expense, such as time, place, and business purpose. Federal rate The federal rate can be figured using any one of the following methods.",What book was published by Claude E. Shannon in 1948?,A Mathematical Theory of Communication
193,"## The Taxpayer Advocate Service (TAS) Is Here To Help You With Your IRS Issues The Taxpayer Advocate Service (TAS) is an independent agency within the Internal Revenue Service (IRS). TAS helps taxpayers resolve their IRS issues quickly and efficiently. If you have a problem with your IRS account or need help with any aspect of your taxes, TAS can assist you.",Can the Taxpayer Advocate Service (TAS) help me with my IRS audit?,"Yes, the Taxpayer Advocate Service (TAS) can help you with your IRS audit."
194,"### What Is TAS? TAS is an _Independent_ organization within the IRS that helps taxpayers and protects taxpayer rights. Their job is to ensure that every taxpayer is treated fairly and that you know and understand your rights under the _Taxpayer Bill of Rights_. They also help with issues related to taxes, such as how to file taxes, what forms to use, and how to pay taxes on time.",What is TAS?,"TAS is an independent organization within the IRS that helps taxpayers and protects taxpayer rights. They ensure that every taxpayer is treated fairly and that you know and understand your rights under the Taxpayer Bill of Rights. They also help with issues related to taxes, such as how to file taxes, what forms to use, and how to pay taxes on time."


In [22]:
df_unique = df.drop_duplicates(subset=['Question', 'Answer'])
df_unique

Unnamed: 0,Context,Question,Answer
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online. _Taxpayer Advocate Service_. If you have questions about your individual income tax return or need help with a problem involving the IRS, contact the Taxpayer Advocate Service (TAS). TAS is an independent office within the IRS that helps resolve problems that cannot be resolved through normal channels. Visit _IRS.gov/Advocacy_ for more information. _IRS Business Assistance Center_. For businesses, visit _IRS.gov/BusinessAssistantCenter_ for answers to common business tax questions and links to resources for small businesses. _IRS Small Business Self-Help_. For small businesses, visit _IRS.gov/SmallBizSelfHelp_ for answers to common small business tax questions and links to resources for small businesses. _IRS Large Business & Specialty Tax_. For large businesses and specialty taxes, visit _IRS.gov/LBSTax_ for answers to common large business and specialty tax questions and links to resources for large businesses and specialty taxes. _IRS International_. For international tax issues, visit _IRS.gov/International_ for answers to common international tax questions and links to resources for international tax issues. _IRS Publications_. For all IRS publications, visit _IRS.gov/Publications_ to view and download publications on various tax topics. _IRS Forms_. For all IRS forms, visit _IRS.gov/Forms_ to view and download forms on various tax topics. _IRS Instructions_. For all IRS instructions, visit _IRS.gov/Instructions_ to view and download instructions on various tax topics. _IRS Newsroom_. For news and updates from the IRS, visit _IRS.gov/Newsroom_. _IRS YouTube Channel_. For videos and tutorials from the IRS, visit _IRS.gov/YouTube_. _IRS Social Media_. Follow the IRS on social media at _Facebook_, _Twitter_, _LinkedIn_, and _Instagram_ for news, updates, and tips. _IRS Contact Information_. Call 800-829-4033 or visit _IRS.gov/ContactInformation_ for phone numbers, mailing addresses, and other ways to contact the IRS. _IRS Website_. Visit _IRS.gov_ for more information about taxes and the IRS. _IRS Security_. Protect yourself from scams and identity theft by keeping your personal and financial information secure. Use strong passwords, keep software up to date, use encryption when transmitting sensitive data, and shred confidential documents before throwing them away. Do not share personal or financial information over the phone, via email, or through text messages.",What book was published by Claude E. Shannon in 1948?,A Mathematical Theory of Communication
1,"## Future Developments For the latest information about developments related to Pub. 535, such as legislation enacted after it was published, go to _IRS.gov/Pub.535_.",Where can I find the latest information about developments related to Pub. 535?,Go to _IRS.gov/Pub.535_
2,"## What's New for 2022 The following items highlight some changes in the tax law for 2022. **Form 1098-k reporting transition period.** The transition period described in _Notice 2023-10_ delays the reporting of transactions in excess of 5600 to transactions that occur after calendar year 2022. The transition period is intended to facilitate an orderly transition for TPSO tax compliance, as well as individual payge compliance with income tax reporting. A participating payge, in the case of a third-party network transaction, is any person who accepts payment from a third-party settlement organization for a business transaction. **The COVID-19 related credit for qualified sick and family leave wages is limited to leave taken after March 31, 2020, and before October 1, 2021.** Generally, the credit for qualified sick and family leave wages, as enacted under the _Families_ First Coronavirus Response Act (FFCRA) and amended and extended by the COVID-related Tax Relief Act of 2020, for leave taken after March 31, 2020, and before April 1, 2021, and the credit for qualified sick and family leave wages under sections 3131, 3132, and 3133 of the Internal Revenue Code, as enacted under the American Rescue Plan Act of 2021 (the ARP), for leave taken after March 31, 2021, and before October 1, 2021, have expired. However, employers that pay qualified sick and family leave wages in 2022 for leave taken after March 31, 2020, and before October 1, 2021, are eligible to claim a credit for qualified sick and family leave wages in 2022. For more information, see _chapter 2_. **The COVID-19 related employee retention credit has expired.** The employee retention credit enacted under the Coronavirus Aid, Relief, and Economic Security (CARES) Act and amended and extended by the Taxplayer certainty and Disaster Tax Relief Act of 2020 was limited to qualified wages paid after March 12, 2020, and before July 1, 2021. The employee retention credit under section 3134 of the Internal Revenue Code, as enacted by the ARP and amended by the Infrastructure Investment and Jobs Act, was limited to wages paid after June 30, 2021, and before October 1, 2021, unless the employer was a recovery startup business. An employer that was a recovery startup business could also claim the employee retention credit for wages paid after September 30, 2021, and before January 1, 2022. For more information, see _chapter 2_. **Credit for COBRA premium assistance payments is limited to periods of coverage beginning on or after April 1, 2021, through periods of coverage beginning on or before September 30, 2021.** Section 9501 of the ARP provides for COBRA premium assistance in the form of a full reduction in the premium otherwise payable by certain individuals and their families who elect COBRA continuation coverage due to a loss of coverage as the result of a reduction in hours or an involuntary termination of employment (assistance eligible individuals). This COBRA premium assistance is available for periods of coverage beginning on or after April 1, 2021, through periods of coverage beginning on or before September 30, 2021. For more information, see _chapter 2_. **The CARES Act provided for a temporary increase in the contribution limit for traditional IRA contributions made by employees who were not covered by a retirement plan at work during 2020.** Under section 72(t)(2)(A) of the Internal Revenue Code, as enacted by the CARES Act, the maximum amount that may be contributed to a traditional IRA for the 2020 tax year was increased from $6,000 to $7,000 for individuals who did not participate in a workplace retirement plan during 2020. This provision applies only to contributions made to traditional IRAs, not to Roth IRAs. Contributions to traditional IRAs are generally deductible on an individual’s federal income tax return, while contributions to Roth IRAs are made with aftertax dollars but provide tax-free withdrawals in retirement.",What is the transition period for Form 1098-k reporting?,The transition period is described in Notice 2023-10 and it delays the reporting of transactions in excess of 5600 to transactions that occur after calendar year 2022.
5,"### Cost recovery Although you generally cannot take a current deduction for a capital expense, you may be able to recover the amount you spend through deprecation, amortization, or depletion. These recovery methods allow you to deduct part of your cost each year. In this way, you are able to recover your capital expense. See _Amortization_ (chapter 8) and _Dealization_ (chapter 9) in this publication. A taxpayer can elect to deduct a portion of the costs of certain depreciable property as a section 179 deduction. A greater portion of these costs can be deducted if the property is qualified disaster assistance property. See Pub. 946 for details.",How do I recover my capital expenses?,"You can recover your capital expenses through deprecation, amortization, or depletion. See Amortization (chapter 8) and Depreciation (chapter 9) in this publication. Additionally, you can also deduct a portion of the costs of certain depreciable property as a Section 179 deduction. If the property is qualified disaster assistance property, you can deduct an even greater portion of the costs. For more information, see Pub. 946."
6,"## Going Into Business The costs of getting started in business, before you actually begin business operations, are capital expenses. These costs may include expenses for advertising, travel, or wages for training employees. Capital expenses are not deductible from your taxable income until they have been used up. This is known as depreciation.",Can capital expenses be immediately deducted from taxable income?,"No, capital expenses are not deductible from taxable income until they have been used up."
...,...,...,...
183,"## 2 Tax Deductions for Business Expenses In this section, we will discuss some common types of business expenses that are eligible for tax deductions. These include travel expenses, entertainment expenses, and office supplies.",What types of business expenses are discussed in this section?,"Travel expenses, entertainment expenses, and office supplies."
187,"### Reimbursers A ""reimbursement or allowance arrangement"" provides for payment of advances, reimbursments, and allowances for travel and non-entertainment-related meals expenses incurred by your employees during the ordinary course of business. If the expenses are substantiated, you can deduct the allowable amount on your tax return. Because of differences between accounting methods and tax law, the amount you can deduct for tax purposes may not be the same as the amount you deduct on your business books and records. For example, you can deduct 100% of the cost of meals on your business books and records. However, only 50% of these costs are allowed by law as a tax deduction.",What is a reimbursement or allowance arrangement?,"A reimbursement or allowance arrangement provides for payment of advances, reimbursals, and allowances for travel and non-entertainment-related meals expenses incurred by your employees during the ordinary course of business."
190,"### Excess reimbursement or allowance. An excess reimbursement or allowance is any amount you pay to an employee that is more than the business-related expenses for which the employee adequately accounted. The employee must return any excess reimbursement or other expense allowance to you within a reasonable period of time. If the employee fails to do so, you may treat it as taxable income.",How does excess reimbursement or allowance work?,An excess reimbursement or allowance occurs when an employee receives payment for expenses they did not actually incur. The employee must return this extra money to the employer within a reasonable time frame. Failure to do so could result in the excess being treated as taxable income.
193,"## The Taxpayer Advocate Service (TAS) Is Here To Help You With Your IRS Issues The Taxpayer Advocate Service (TAS) is an independent agency within the Internal Revenue Service (IRS). TAS helps taxpayers resolve their IRS issues quickly and efficiently. If you have a problem with your IRS account or need help with any aspect of your taxes, TAS can assist you.",Can the Taxpayer Advocate Service (TAS) help me with my IRS audit?,"Yes, the Taxpayer Advocate Service (TAS) can help you with your IRS audit."


In [24]:
output_df = df_unique[['Context', 'Question', 'Answer']]

output_dir = 'data/output'

uniflow_output_path = f"{output_dir}/new_irs_QApairs.csv"

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

output_df.to_csv(uniflow_output_path, index=False)