## AI/ML Coding Round
### Problem: 
Train a LLM that can answer queries about JFrog Pipelines' [native steps](https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-steps). 
When posed with a question like "How do I upload an artifact?" or "What step should I use for an Xray scan?", the model should list the appropriate native step(s) and provide an associated YAML for that step.

 ### Requirements
1. Data Collection: Acquire publicly available information on Native Steps from JFrog's website that contain information on native steps for building pipelines. Data that is not publicly accessible falls outside the scope of this coding challenge. (https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-steps)
2. Data Preprocessing: Process the text to make it suitable for training. This might involve tokenization, stemming, and other NLP techniques.
3. Model Training: Train a LLM on the (preprocessed) dataset. You can choose one of the freely available open source model like BERT or any other model available
4. Query Handling: Implement a function that takes a user query as input and returns the appropriate native step(s) and a sample YAML configuration.
5. YAML Generation: Implement a function that can generate a sample YAML configuration based on the identified native step(s).
------------

```python
@author - Midhun Kumar
@email  - midhunkumar04@agmail.com
```
-----------------------------------------------------------------------

### 1. Data scraping and preparation


#### Data Understanding
1. I have started exploring the documentation sites given in the problem and noted the below points.
    - The site has multiple navigation links for various pipline procedures.
    - Each discipline procedure has two main things, like instructions and yaml pipline exampls we can use them for training
    - If we can extract these links, we can use them to scrape the data from each document and use them for preparation.

1.1 Link Extraction
 -  Extarcting the pipeline documentaion links from following site 
https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-steps


In [2]:
# importing libraries for link extraction
from bs4 import BeautifulSoup as soup
import requests
from pprint import pprint

In [3]:
# Link extraction function
def linkExtarction(url:str) -> list:
    '''
    This function takes url:str as input and retrun list of link in the site
    '''
    result = requests.get(url)
    scrapedData = soup(result.content)
    links =  scrapedData.select('a')
    return [link['href'] for link in links]

In [4]:
# verifying the links
url = 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-steps'
links = linkExtarction(url)
pprint(links)

['https://jfrog.com/help/r/jfrog-pipelines-documentation',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/jfrog-pipelines',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-use-cases',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-concepts',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-step-by-step',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/see-it-live',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-quickstart',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipeline-example-hello-world',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipeline-example-docker-build-and-push',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipeline-example-release-to-edge-node',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipeline-example-go-build',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipeline-example-npm-build',
 'https://jfrog.com/he

Remarks:
- From the above results, we can see we have extracted all the links from the site.
- We can also observe that many links do not have proper naming conventions, so we are going to focus on only the links with proper naming so that we can use them to label the data.

In [5]:
''' 
Filtering only required links 
'''
import re # importing regex for string matching

def filter_links(links:list)->list:
    ''' 
    This function will take list of links and return filter list based on condition
    '''
    filterd_links = []
    for link in links:
        matchString = 'https://jfrog.com/help/r/jfrog-pipelines-documentation/'
        if re.search(matchString,link):
            filterd_links.append(link)
    return filterd_links

In [6]:
pprint(filter_links(links))

['https://jfrog.com/help/r/jfrog-pipelines-documentation/jfrog-pipelines',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-use-cases',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-concepts',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-step-by-step',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/see-it-live',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-quickstart',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipeline-example-hello-world',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipeline-example-docker-build-and-push',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipeline-example-release-to-edge-node',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipeline-example-go-build',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipeline-example-npm-build',
 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipeline-example-maven-b

Reamrks:
- We have filterd all the required links

1.2 Now will work on data extarction for each link

In [18]:
''' 
Experimenting data extartcion  
'''
import html2text
url = 'https://jfrog.com/help/r/jfrog-pipelines-documentation/creating-dynamic-nodes-on-kubernetes'
# test = soup(responce.text, 'html.parser')
def htmlTotext(url:str) -> str:
    
    responce = requests.get(url)
    h = html2text.HTML2Text()
    h.body_width = 0
    h.ignore_links = True # to ignore the links 
    h.ignore_images = True # to ignore the images 
    h.ignore_tables = True  # To remove table tags and ignore the table 
    # h.ignore_emphasis =True
    h.mark_code = True # to mark if code snipets are present its importent to identify yaml schma
    h.single_line_break = True
    return str(h.handle(responce.text))

print(htmlTotext(url))

# Creating Dynamic Nodes on Kubernetes
## JFrog Pipelines Documentation
ft:sourceType
    Paligo
  * JFrog Pipelines
  * Pipelines Use Cases
  * Pipelines Concepts
  * Pipelines Step-By-Step
  * See it Live
  * Pipelines Quickstart
  * Pipeline Example: Hello World
  * Pipeline Example: Docker Build and Push
  * Pipeline Example: Release to Edge Node
  * Pipeline Example: Go Build
  * Pipeline Example: Npm Build
  * Pipeline Example: Maven Build
  * Pipeline Example: Helm Blue-Green Deploy
  * Configuring Pipelines
  * Managing Pipelines Integrations
  * Managing Pipeline Sources
  * Pipeline Source Sync Recovery
  * Managing Pipelines Node Pools
  * Creating Custom VM Images
  * Creating Dynamic Nodes on Kubernetes
  * Managing Pipelines Static Nodes
  * Sending Pipelines Nodes Agent Logs to Logstash
  * Creating Pipelines
  * Defining a Pipeline
  * Pipelines Integrations
  * Airbrake Integration
  * Artifactory Integration
  * AWS Keys Integration
  * Azure Keys Integration
  * Digi

Remarks:
 - From above obervations we have removed many unwanted lines and data by using various `html2text` options
 - We can also see output format is `markdown` its usefull when we show the output in GenAi as markdown
 - We also added `code block` in output which will useful when we print and extarct from bot output
 - Still we need to remove some data like sidebar data and some common unwanted lines(`back to homepage, ## JFrog Pipelines Documentation etc`) from data
 - `From various links  have verified first 250 lines and last line are same in all the docuemnt hence we can stripe them`

In [19]:
data = htmlTotext(url)
data = ('\n').join(data.split('\n')[250:-2])

In [20]:
data = data + f'\n\n Document url for reference - {url}'  # we can append this for reference link in answer
print(data)

This tutorial explains how to specify a kubeconfig for a Kubernetes Integration to authenticate to a self-hosted Kubernetes cluster for a dynamic node pool. You can use a cloud provider solution like EKS, GKE, or AKS, or a self-hosted Kubernetes solution.
This tutorial assumes that you have working knowledge of Docker and Kubernetes and understand the following concepts:
  * Self hosting on GCP
  * Self hosting on AWS
  * kubeconfig files
  * Configuring Service Accounts


##### Configure a Kubernetes Service Account
You must  configure a service account in Kubernetes to provide an identity for the build node processes that Pipelines will dynamically control.
This procedure will use your personal account to create the service account. Make sure your personal account has permissions to do this.
###### Verify Access to the Cluster
First, make sure you can authenticate yourself to the cluster. This means you have a kubeconfig file that uses your personal account. You can verify this by ru

Remarks:
`Able to successully extract the data for our desired format now we can append the data from each link and prepare the data for training`

In [37]:
''' 
final Function to concat all the data
'''
import pandas as pd # for dataframe creation and convert to csv
from tqdm import tqdm
pd.set_option('display.max_colwidth',None)

def dataOrchestrator(url:str) -> pd.DataFrame:
    ''' 
    This function will take the parent portal URL and extrcat links and data from all pipline procedures
    and return pandas DataFareme 
    '''
    dataFrame = pd.DataFrame(columns=['Title', 'PiplineProcess'])
    all_links = tqdm(linkExtarction(url), desc=f'Extrating all the links from - {url}')
    pipline_links = tqdm(filter_links(all_links), desc=f'Filtering only pipline links')
    for link in tqdm(pipline_links, desc=f'Data extraction in progress...'):
        temp_data = htmlTotext(link)
        data = ('\n').join(temp_data.split('\n')[250:-2]) # striping first 250 and last line
        data = data + f'\n\n Document url for reference - {url}' # Adding refernce URL at the end of every answer
        templist = [[link[55:], data]]
        dataFrame = dataFrame._append(pd.DataFrame(templist, columns=['Title', 'PiplineProcess']),ignore_index=True)
    return dataFrame
        

In [36]:
# Finale data extraction
parentUrl = 'https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-steps'
final_data = dataOrchestrator(parentUrl)
final_data

Extrating all the links from - https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-steps: 100%|██████████| 329/329 [00:00<00:00, 249264.09it/s]
Filtering only pipline links: 100%|██████████| 244/244 [04:33<00:00,  1.12s/it]
Data extraction in progress...: 100%|██████████| 244/244 [04:33<00:00,  1.12s/it]


Unnamed: 0,Title,PiplineProcess
0,jfrog-pipelines,JFrog Pipelines offers JFrog Platform customer...
1,pipelines-use-cases,Let's explore some of the most common ways to ...
2,pipelines-concepts,"Before learning how to use Pipelines, here are..."
3,pipelines-step-by-step,After you have a Pipelines installation workin...
4,see-it-live,Have we piqued your interest? Ready to see som...
...,...,...
239,managing-runtimes,Every step in your pipeline executes on a buil...
240,choosing-your-runtime-image,"By default, your steps run inside a container ..."
241,running-steps-on-the-host,When you need to execute your step directly on...
242,choosing-node-pools,"A pipeline can, if necessary, control through ..."


In [42]:
final_data.iloc[[0,78,101,234]]

Unnamed: 0,Title,PiplineProcess
0,jfrog-pipelines,"JFrog Pipelines offers JFrog Platform customers three vital capabilities: end-to-end automation (CI/CD), workflow and tool orchestration, and the optimization of the JFrog toolset functionality in use. Consistent with JFrog’s customer-centric product philosophy, Pipelines is enterprise-ready and universal.\n### Workflow Automation\nA pipeline is an event-driven automated workflow for executing a set of DevOps activities (CI, deployments, infrastructure provisioning, etc). It is composed of a sequence of interdependent **steps** which execute discrete functions. Steps act on **resources** , which hold the information needed to execute (files, key-value pairs, etc).\nDevelopers can create pipelines easily with a simple declarative YAML-based language. While each step in a pipeline executes in a stateless runtime environment, Pipelines provides facilities to manage state and step outputs across the workflow so that all dependent steps can access the information they need from upstream steps in order to execute. This helps coordinate activities centrally across diverse DevOps tools and teams without custom DIY scripts.\nWorkflows can be configured for a variety of scenarios, including:\n * Continuous Integration for your applications\n * Continuous Delivery workflows that connect all your CI/CD and DevOps activities across tools and functional silos\n * Automate IT Ops workflows like infrastructure provisioning, security patching, and image building\n\n\n### Get up and running with JFrog Pipelines\nIn this section, you will find information to get you started whether you are a new user or an existing user.\n * If you do not yet have a subscription, get started with trial subscription of the JFrog Platform on the Cloud.\n * If you are a new user, get started with the onboarding videos for JFrog Pipelines.Onboarding Best Practices: JFrog Pipelines\n\n\n### Features\n#### Pipelines as Code\nDefine your automated workflow through code, using a domain specific language in a YAML file of key-value pairs that you can create and maintain with your favorite text editor.\n#### Real Time Visibility\nJFrog Pipelines renders your pipeline definition as an interactive diagram, helping you to see the flow of tasks and their inter-dependencies, as well as view the success record of any runs that were performed.\n#### Universal\nConnect your pipeline automation to your source code repositories in a version control system (such as GitHub or BitBucket) to automatically trigger execution on any new submission (commit) of a code change. Connect to other popular tools through your credentials for storage, issue-tracking, notification, orchestration and more through a library of integrations.\n#### Native Integration with Artifactory\nJFrog Pipelines is designed to be used with Artifactory, with built-in directives for pushing artifacts, performing builds, pushing build information, image scanning, and build promotion.\n#### Integration with JFrog Platform\nJFrog Pipelines is designed as an integral part of the JFrog platform, including scanning artifacts/builds through Xray, the creation and delivery of release bundles through JFrog Distribution, for a complete end-to-end SDLC pipeline from commit to production runtime.\n#### Security First\nFine-grained permissions and access control limit who can access workflows. Centralized, encrypted storage of credentials and keys help ensure secrets stay safe.\n#### Enterprise-Ready\nManage multiple execution nodes using a single installation of Pipelines and automatically distribute Pipeline execution across them for scale and speed.\n### Watch the Screencast\n\n Document url for reference - https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-steps"
78,dockerbuild,"The **DockerBuild** native step performs a build to produce a Docker image from a Dockerfile in a GitRepo source repository resource.\nIn the step configuration, you must provide the name ( `dockerFileName` ) and directory ( `dockerFileLocation` ) of the Dockerfile that contains the command to be processed by a `docker build` command, as well as the name ( `dockerImageName` ) and tag ( `dockerImageTag` ) of the resulting image. The image is built on the build node, and information about that image is stored in the run state.\nTo build a Docker image that relies on a private base image:\n 1. Define the base image as an Image resource, with `autoPull` set to `true`.\n 2. Specify the Image resource as one of the `inputResources` of the DockerBuild step.\n\n\nTo include artifacts in the Docker image that are not part of the GitRepo source repository:\n 1. Define a FileSpec resource that specifies the files to include from Artifactory.\n 2. Specify the FileSpec resource as one of the `inputResources`of the DockerBuild step.\n\n\n### Proper usage of DockerBuild step\nDockerBuild and DockerPush steps must be assigned to the same `affinityGroup` to share state. If this is not done, the output of DockerBuild will not be available for DockerPush. For more information on using `affinityGroup`, see Running multiple steps on the same build node.\n### Docker Build and Push Quickstart\nThis Docker Build and Push quickstart demonstrates the definition of a pipeline that uses the DockerBuild and DockerPush native steps to build a single Docker Image, push it to Artifactory, and then publish the BuildInfo.\n##### YAML Schema\nThe YAML schema for DockerBuild native step is as follows:\n **DockerBuild**\n[code]\n pipelines: \n - name: <string>\n steps:\n - name: <string>\n type: DockerBuild\n configuration:\n #inherits all the tags from bash\n affinityGroup: <string>\n dockerFileLocation: <string>\n dockerFileName: <string>\n dockerImageName: <string>\n dockerImageTag: <string>\n dockerOptions: <string>\n \n integrations:\n - name: <artifactory or docker registry integration> # required\n \n inputResources:\n - name: <GitRepo resource> # required, git repository containing your Dockerfile\n - name: <Image resource> # optional base image\n - name: <FileSpec resource> # optional\n \n execution:\n onStart:\n - echo ""Preparing for work...""\n onSuccess:\n - echo ""Job well done!""\n onFailure:\n - echo ""uh oh, something went wrong""\n onComplete: #always\n - echo ""Cleaning up some stuff""\n[/code]\n##### Tags\n###### name\nAn alphanumeric string (underscores are permitted) that identifies the step.\n###### type\nMust be `DockerBuild` for this step type.\n###### configuration\nSpecifies all configuration selections for the step's execution environment. This step inherits the Bash/ PowerShell step configuration tags, including these pertinenttags:\nTag\n **Description of usage**\nRequired/Optional \n`affinityGroup`\nMust specify an affinity group string that is the same as specified in a subsequent DockerPush step.\nOptional \n`inputResources`\nMust specify:\n * a GitRepo resource (that contains the Dockerfile)\n\n\nOptionally, you may also specify:\n * One or more Image resources to pull base images used in the build or to trigger this build.\n * One or more FileSpec resources that specify what files to include in the build context. These files are automatically copied to `dockerFileLocation`.\n\n\nRequired/Optional \nIn addition, these tags can be defined to support the step's native operation:\n### Tags derived from Bash\nAll native steps derive from the Bash step. This means that all steps share the same base set of tags from Bash, while native steps have their own additional tags as well that support the step's particular function. So it's important to be familiar with the Bash step definition, since it's the core of the definition of all other steps.\nTag\n **Description of usage**\nRequired/Optional \n`dockerFileLocation`\nDirectory containing the Dockerfile, which is the file that has Docker build configuration. This file is also used as the context for the Docker build. The path provided should be relative to the root of the input GitRepo repository. If no location is provided, the default is the root of the GitRepo repository.\nRequired \n`dockerFileName`\nName of the Dockerfile.\nRequired \n`dockerImageName`\nThe name of the Docker image to create. This can be set using environment variables or triggering a run using parameters.\nRequired \n`dockerImageTag`\nThe tag for the Docker image to create. This can be set using environment variables or triggering a run using parameters.\nRequired \n`dockerOptions`\nAdditional options for the docker build command.\nOptional \n###### execution\nDeclares collections of shell command sequences to perform for pre- and post-execution phases:\nTag\n **Description of usage**\nRequired/Optional \n`onStart`\nCommands to execute in advance of the native operation\nOptional \n`onSuccess`\nCommands to execute on successful completion\nOptional \n`onFailure`\nCommands to execute on failed completion\nOptional \n`onComplete`\nCommands to execute on any completion\nOptional \nThe actions performed for the `onExecute` phase are inherent to this step type and may not be overridden.\n##### Examples\nThe following examples use a GoLang Git repository represented by a GitRepo resource named `gosvc_app` to create a Docker image that is published to Artifactory. They assume that an Artifactory integration named `MyArtifactory` has been created, and that the Artifactory instance has a Docker repository mapped to `docker.artprod.company`.\n * These examples require an Artifactory Integration and a GitHub Integration.GitHub Integration\n * The Pipelines DSL for a similar example is available in this repository in the JFrog GitHub account.\n * For a full tutorial, see Pipeline Example: Docker Build and Push.\n\n\nThe following resources declarations support these examples. Not all of these resources are used in all examples.\n###### Resources\n[code]\n resources:\n # Application source repository\n - name: gosvc_app\n type: GitRepo\n configuration:\n gitProvider: myGithub\n path: myuser/myrepo # replace with your repository name\n branches:\n include: master\n \n # Docker image in an Artifactory repository\n - name: base_image\n type: Image\n configuration:\n registry: myArtifactory\n sourceRepository: docker-local # replace with your repository name\n imageName: docker.artprod.mycompany.com/baseimage\n imageTag: latest\n autoPull: true\n \n # Files in an Artifactory repository\n - name: icon_files\n type: FileSpec\n configuration:\n sourceArtifactory: myArtifactory\n pattern: my-local-repo/all-my-images/\n target: icons/\n[/code]\n###### Build a Docker image from a source repository\nThis example builds a Docker image to a Docker registry in Artifactory. The tag for the image is set to the pipeline's run number.\n[code]\n pipelines:\n - name: demo_pipeline\n steps:\n - name: bld_image\n type: DockerBuild\n configuration:\n dockerFileLocation: .\n dockerFileName: Dockerfile\n dockerImageName: docker.artprod.mycompany.com/gosvc # replace with your fully qualified Docker registry/image name\n dockerImageTag: ${run_number}\n inputResources:\n - name: gosvc_app\n integrations:\n - name: MyArtifactory\n[/code]\n###### Build a Docker image with dockerOptions\nThis example demonstrates use of the `dockerOptions` tag to set the `build-arg` option for the Docker command. An environment variable named `build_number_env_variable` is dynamically set to the pipeline's run number. The example assumes the environment variable is used in the Dockerfile commands.\n[code]\n pipelines:\n - name: demo_pipeline\n steps:\n - name: bld_image\n type: DockerBuild\n configuration:\n dockerFileLocation: .\n dockerFileName: Dockerfile\n dockerImageName: docker.artprod.mycompany.com/gosvc # replace with your fully qualified Docker registry/image name\n dockerImageTag: ${run_number}\n dockerOptions: --build-arg build_number_env_variable=${run_number} \n inputResources:\n - name: gosvc_app\n integrations:\n - name: MyArtifactory\n[/code]\n###### Build a Docker image with a private base image\nThis example builds a Docker image that relies on a private base image stored in an Artifactory Docker repository.\n[code]\n pipelines:\n - name: demo_pipeline\n steps:\n - name: bld_image\n type: DockerBuild\n configuration:\n dockerFileLocation: .\n dockerFileName: Dockerfile\n dockerImageName: docker.artprod.mycompany.com/gosvc # replace with your fully qualified Docker registry/image name\n dockerImageTag: ${run_number}\n inputResources:\n - name: gosvc_app\n - name: base_image\n integrations:\n - name: MyArtifactory\n[/code]\n###### Build a Docker image with files outside the current path\nThis example demonstrates building a Docker image that includes files outside of the current path. It pulls icon files stored in an Artifactory repository for integration art named `my-local-repo`. It is assumed that the Dockerfile has a command that will include the files in `/icons` into the image.\n[code]\n pipelines:\n - name: demo_pipeline\n steps:\n - name: bld_image\n type: DockerBuild\n configuration:\n dockerFileLocation: .\n dockerFileName: Dockerfile\n dockerImageName: docker.artprod.mycompany.com/gosvc # replace with your fully qualified Docker registry/image name\n dockerImageTag: ${run_number}\n inputResources:\n - name: gosvc_app\n - name: icon_files\n integrations:\n - name: MyArtifactory\n[/code]\n##### How it Works\nWhen you use the **DockerBuild** native step in a pipeline, it performs the following functions in the background:\n * cp (if there is a FileSpec input, copy those files to the root of the cloned GitRepo input)\n * docker build\n * add_run_variables (add several variables that are later used when pushing the Docker image or publishing build info)\n * jfrog rt build-collect-env (collect environment information to be later published as part of build info)\n * add_run_files (save information collected for build info)\n\n\n\n Document url for reference - https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-steps"
101,uploadartifact,"The **UploadArtifact** native step uploads artifacts to Artifactory. Optionally, it can also publish build information to Artifactory and trigger Xray scans.\nThis step utilizes the JFrog CLI to upload an artifact to Artifactory. The file(s) may be provided in a FileSpec, if already in Artifactory, or RemoteFile or GitRepo input.\n##### YAML Schema\nThe YAML schema for UploadArtifact native step is as follows:\n **UploadArtifact**\n[code]\n pipelines: \n - name: <string>\n steps:\n - name: <string>\n type: UploadArtifact\n configuration:\n targetPath: <string> #required\n sourcePath: <string> #optional\n properties: <string> #optional\n regExp: <boolean> #optional\n flat: <boolean> #optional\n module: <string> #optional\n deb: <string> #optional\n recursive: <boolean> #optional\n dryRun: <boolean> #optional\n symlinks: <boolean> #optional\n explode: <boolean> #optional\n exclusions: <string> #optional\n includeDirs: <boolean> #optional\n syncDeletes: <string> #optional\n forceXrayScan: <boolean> #optional\n failOnScan: <boolean> # default true\n autoPublishBuildInfo: <boolean> #optional\n inputResources:\n - name: myGitRepo \n - name: artifactoryFileSpec \n - name: myRemoteFile \n outputResources:\n - name: myFileSpec\n - name: myBuildInfo\n integrations:\n - name: myArtifactory \n execution:\n onStart:\n - echo ""Preparing for work...""\n onSuccess:\n - echo ""Job well done!""\n onFailure:\n - echo ""uh oh, something went wrong""\n onComplete: #always\n - echo ""Cleaning up some stuff""\n \n[/code]\n##### Tags\n###### name\nAn alphanumeric string (underscores are permitted) that identifies the step.\n###### type\nMust be `UploadArtifact` for this step type.\n###### configuration\nSpecifies all configuration selections for the step's execution environment. This step inherits the Bash/ PowerShell step configuration tags, including these pertinent tags:\nTag\n **Description of usage**\nRequired/Optional \n`integrations`\nMust specify an Artifactory Integration.\nRequired \n`inputResources`\nMay specify a GitRepo, FileSpec, or RemoteFile resource containing the file(s) to be uploaded. One of each type may be specified.\nOptional \n`outputResources`\nMust specify a BuildInfo resource if `autoPublishBuildInfo` is set as `true`.\nIf `JFROG_CLI_BUILD_NAME` or `JFROG_CLI_BUILD_NUMBER` is set as an environment variable for the pipeline or the step, that name and/or number is used for the output BuildInfo. Otherwise, the default `buildName` and `buildNumber` are `$pipeline_name` and `$run_number.`\nMay also specify a FileSpec resource to be updated with the pattern and properties of the uploaded Artifact.\nMay be required \nIn addition, these tags can be defined to support the step's native operation:\nTag\n **Description of usage**\nRequired/Optional \ntargetPath\nPath to upload the files, including repository name.\nRequired \n`sourcePath`\nFiles to upload. If this is a relative path pattern, it is relative to the root of a GitRepo/FileSpec/RemoteFile input.\nDefault is `*` when `regExp` is `false` and `.*` when `regExp` is `true`.\nOptional \n`properties`\nSemi-colon separated properties for the uploaded artifact. For example: `myFirstProperty=one;mySecondProperty=two`.\nProperties `pipelines_step_name`, `pipelines_run_number`, `pipelines_step_id`, `pipelines_pipeline_name`, `pipelines_step_url`, `pipelines_step_type`, and `pipelines_step_platform` will also be added.\nOptional \n`regExp`\nWhen set as `true`, regular expressions are used in other parameters, such as `sourcePath`, instead of wildcards. Expressions must be in parentheses.\nDefault is `false`.\nOptional \n`flat`\nWhen set as `true`, the uploaded files are flattened, removing the directory structure.\nDefault is `false`.\nOptional \n`module`\nA module name for the Build Info.\nOptional \n`deb`\nA `distribution/component/architecture` for Debian packages. If the distribution, component, or architecture includes a / it must be double-escaped, For example: `distribution/my\\\/component/architecture` for a `my/component` component.\nOptional \n`recursive`\nWhen set as `false`, do not upload any matches in subdirectories.\nDefault is true.\nOptional \n`dryRun`\nWhen set as `true`, nothing is uploaded.\nDefault is `false`.\nOptional \n`symlinks`\nWhen set as `true`, symlinks matching the other criteria are uploaded.\nDefault is `false`.\nOptional \n`explode`\nWhen set as `true` and the uploaded Artifact is an archive, the archive is expanded.\nDefault is `false`.\nOptional \n`exclusions`\nSemi-colon separated patterns to exclude.\nOptional \n`includeDirs`\nWhen set as `true`, empty directories matching the criteria are uploaded.\nDefault is `false`.\nOptional \n`syncDeletes`\nA path under which to delete any existing files in Artifactory.\nOptional \n`forceXrayScan`\nWhen set as `true`, forces an Xray scan after publishing to Artifactory.\nDefault is `false`.\nOptional \n`failOnScan`\nWhen set as `true`, and when the Xray Policy Rule Fail Build checkbox is checked, a failed Xray scan will result in a failure of the step.Creating Xray Policies and Rules\nDefault is `true`.\nOptional \n`autoPublishBuildInfo`\nWhen set as `true`, publishes build info to Artifactory.\nDefault is `false`.\nOptional \n###### execution\nDeclares collections of shell command sequences to perform for pre- and post-execution phases:\nTag\n **Description of usage**\nRequired/Optional \n`onStart`\nCommands to execute in advance of the native operation\nOptional \n`onSuccess`\nCommands to execute on successful completion\nOptional \n`onFailure`\nCommands to execute on failed completion\nOptional \n`onComplete`\nCommands to execute on any completion\nOptional \nThe actions performed for the `onExecute` phase are inherent to this step type and may not be overridden.\n### Note\n`onExecute`, `onStart`, `onSuccess`, `onFailure`, and `onComplete` are reserved keywords. Using these keywords in any other context in your execution scripts can cause unexpected behavior.\n##### Examples\nThe following examples show a few ways in which a UploadArtifact step can be configured.\n###### Uploading an Artifact to Another Repository using a FileSpec Resource\nThe most basic form of UploadArtifact. Uses all default values. This step will download the file matching the FileSpec and upload it to the location in `targetPath`. The optional output FileSpec resource will be updated with the `targetPath` and the default properties added to the uploaded artifact.\n **UploadArtifact**\n[code]\n pipelines: \n - name: uploadArtifactPipeline\n steps:\n - name: uploadArtifactStep\n type: UploadArtifact\n configuration:\n targetPath: my-repository/myDirectory/myFile.txt\n integrations:\n - name: myArtifactoryIntegration\n inputResources:\n - name: myInputFileSpec\n outputResources:\n - name: myOutputFileSpec\n \n[/code]\n###### Uploading an Artifact from a RemoteFile Resource\nIn this example, the input is a RemoteFile resource. Otherwise, this is very similar to the previous example with an input that downloads a file that is then uploaded and an optional FileSpec output updated for the uploaded file.\n **UploadArtifact**\n[code]\n pipelines: \n - name: uploadArtifactPipeline\n steps:\n - name: uploadArtifactStep\n type: UploadArtifact\n configuration:\n targetPath: my-repository/myDirectory/myFile.txt\n integrations:\n - name: myArtifactoryIntegration\n inputResources:\n - name: myInputRemoteFile\n outputResources:\n - name: myOutputFileSpec\n \n[/code]\n###### Publish Build Info and Trigger Xray Scan\nIn this example, build info is published as part of the UploadArtifact step and an Xray scan is triggered.\n **UploadArtifact**\n[code]\n pipelines: \n - name: uploadArtifactPipeline\n steps:\n - name: uploadArtifactStep\n type: UploadArtifact\n configuration:\n targetPath: my-repository/myDirectory/myFile.txt\n autoPublishBuildInfo: true\n forceXrayScan: true\n integrations:\n - name: myArtifactoryIntegration\n inputResources:\n - name: myFileSpec\n outputResources:\n - name: myBuildInfo\n \n[/code]\n##### How it Works\nWhen you use the **UploadArtifact** native step in a pipeline, it performs the following functions in the background:\n * jfrog rt config (configure JFrog CLI with the integration listed in the yaml)\n * jfrog rt use (configure JFrog CLI to use the config for the integration listed in the yaml)\n * mkdir (create a directory to use as the root of relative paths in the following actions)\n * cp (copy the FileSpec, RemoteFile, or GitRepo files to the new directory, limit one of each input type)\n * jfrog rt upload (upload the Artifact)\n * write_output (update the FileSpec output resource with the uploaded pattern and properties)\n * add_run_variables (save information in run state for future steps to reference)\n * jfrog rt build-collect-env (collect the build environment, preparing for build publish)\n * jfrog rt build-publish (publish the build, only if autoPublishBuildInfo is true)\n * write_output (update the BuildInfo output resource with the published name/number)\n * jfrog rt build-scan (if forceXrayScan is true)\n * add_run_files (adds build info to run state)\n\n\n\n Document url for reference - https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-steps"
234,conditional-workflows,"Conditional workflow in Pipelines enables you to choose if a step executes or skips based on certain conditions set for the previous upstream step. This means, when the workflow reaches a conditional step, it can choose different workflow paths based on the step’s status. This provides more flexibility in the execution logic of a pipeline.\n### Note\nConditional workflow can be applied to any Pipelines step.\n#### Step Status Conditional Workflow\nWith the `status` conditional workflow, you can configure a step to execute only if an input step’s status, during its current run, is satisfied. You can configure any number of statuses for a step.\n **YAML Schema**\n[code]\n steps:\n - name: <step_name>\n type: <step_type>\n configuration:\n allowFailure: boolean #optional\n inputSteps:\n - name: <step_name>\n status:\n - <terminal_status>\n - <terminal_status>\n - <terminal_status>\n[/code]\n### Note\nIt is important to note that the status of an input step in the current run only is considered for conditional workflows. If a step is not part of the current run, it is always assumed that the condition for that input step is met.\n##### Adding Conditional Workflow for Steps\nTo add a conditional workflow for a step:\n 1. In the `inputSteps` section of a step, add the `status` property.\n 2. Add any of these values:\n * `success`\n * `failure`\n * `error`\n * `cancelled`\n * `skipped`\n * `unstable`\n * `timeout`\n### Note\nEnsure that the values are in lowercase and use the same spelling as shown above. Any deviation from this will cause the pipeline source sync to fail.\n **Example** : In this example:\n * step_B has only one status: `success`\n * step_C has multiple statuses: `failure`, `skipped`, `cancelled`\n[code] - name: step_A\n type: Bash\n configuration:\n allowFailure: boolean #optional\n inputSteps:\n - name: step_B\n status:\n - success\n - name: step_C\n status:\n - failure\n - skipped\n - cancelled\n[/code]\n 3. **allowFailure**\n **Optional** : If you do not want a particular step to contribute to the final status of the run, you can add `allowFailure: true` to the `configuration` section of that step. When this option is used, even when a step fails or is skipped, the final status of the run is not affected.\nFor example, a pipeline contains two steps S1 and S2: \n * **Scenario 1** : Step S2 is a cleanup step and its status is irrelevant. The overall run status should be determined by S1’s status and S2’s status should be ignored. In this case, add `allowFailure: true` to S2, since this is purely a cleanup step and only S1’s status should be taken into consideration.\n * **Scenario 2** : Step S1 has been configured to fail as part of the workflow. However S2 runs even if S1 fails and the run is not to be considered a failure. The run’s final status should mirror S2’s status since S1’s status does not interrupt the flow. In this case, add `allowFailure: true` to S1 since S1’s failure is a known possibility and expected, and that should not affect the final status of the run.\n\n\nFor more examples, see allowFailure Examples.\n#### Run Variable Conditional Workflow\nCreate a condition based on the values of `add_run_variables` environment variable, so that a step can be skipped based on dynamically set variables before it gets assigned to a node.\n### Note\nWhen using a `condition`, boolean values must be enclosed in quotes.\n **Examples** :\n * `condition: 'trigger == ""true""'`\n * `condition: ""trigger == 'true'""`\n * `condition: trigger == 'true'`\n * `condition: trigger == ""true""`\n\n\n### Note\nPipelines environment variables cannot be added as a condition.Pipelines Environment Variables\n **YAML Schema**\n[code]\n steps:\n - name: <step_name>\n type: <step_type>\n execution:\n onExecute:\n - add_run_variables 'key=value'\n - name: <step_name>\n type: <step_type>\n configuration:\n condition: 'key == value' // Any logical boolean expression that results in a boolean\n inputSteps:\n - name: <step_name>\n[/code]\n **Example**\n[code]\n pipelines:\n - name: Example\n steps:\n - name: step1\n type: Bash\n execution:\n onExecute:\n - echo 'step1'\n - add_run_variables 'var1=1'\n - name: step2\n type: Bash\n configuration:\n condition: 'var1 == 1' // Any logical boolean expression that results to a boolean\n inputSteps:\n - name: step1\n execution:\n onExecute:\n - echo 'success'\n[/code]\n#### Environment Variables Conditional Workflow\nCreate a conditional workflow based on environment variables defined in the configuration section of your pipelines YAML file. The step executes when the declared condition is met.\n **Example**\n[code]\n pipelines:\n - name: myPipelines\n configuration:\n environmentVariables:\n readOnly:\n new_env:\n default: 1\n allowCustom: true\n steps:\n - name: step1\n type: Bash\n configuration:\n environmentVariables:\n new_env:\n default: 2\n #allowCustom: true\n condition: new_env == 2\n execution:\n onExecute:\n - echo $new_env\n - name: step2\n type: Bash\n configuration: \n condition: new_env == 1\n inputSteps:\n - name: step1\n execution:\n onExecute:\n - echo 'success'\n[/code]\n#### newVersionOnly Conditional Workflow\nWhenever a resource undergoes a change, its version is updated and the dependent step is triggered. This is the default behavior for all input resources. To skip steps in a run when input resources are not updated, add the `newVersionOnly` tag and set it as `true`. During a run, the step is triggered only when the resource is updated. If the resource is not updated, the step is skipped and all the downstream steps are skipped as well.\n **Example 1 - newVersionOnly**\n[code]\n pipelines:\n - name: java_pipeline\n steps:\n - name: step_1\n type: Bash\n configuration:\n inputResources:\n - name: my_app_repo\n - newVersionOnly: true\n execution:\n onExecute:\n - pushd $res_my_app_repo_resourcePath\n - ./execute.sh\n - popd\n[/code]\n **Example 2 - newVersionOnly**\n[code]\n resources:\n - name: new_resource\n type: PropertyBag\n configuration:\n runNumber: 0\n \n pipelines:\n - name: pipeline_01\n steps:\n - name: step_input\n type: Bash\n configuration:\n outputResources:\n - name: new_resource\n execution:\n onExecute:\n #- write_output new_resource runNumber=${run_number}\n - echo ""test""\n \n - name: step1 \n type: Bash\n configuration:\n inputResources:\n - name: new_resource\n newVersionOnly: true\n execution:\n onExecute:\n - echo ""test""\n \n - name: step2\n type: Bash\n configuration:\n inputResources:\n - name: new_resource\n execution:\n onExecute:\n - echo ""test""\n[/code] \n#### Viewing Run Logs\nWhen you run a pipeline, in addition to the other logs, the logs for steps with conditional workflow provide information about the skipped steps.\nTo view these logs, go to the Pipeline Run Logs view, click the skipped step to display the logs for the current run. \n#### Examples\n##### Example 1\nIn this example:\n * Step B is triggered only if step A succeeds (default behavior), and step C is triggered only if step A is in failed, error, or timeout status.\n * Step B does not need any special configuration as the default behavior is to trigger a dependent step if the previous step succeeds.\n * Step A also does not need any special configuration since the step itself does not decide the downstream workflow path.\n\n \n **YAML**\n[code]\n - name: demo_conditional\n steps:\n - name: step_A\n type: Bash\n configuration:\n inputResources:\n - name: script_conditional\n execution:\n onExecute:\n - echo ""Executing step_A""\n - printenv\n \n - name: step_B\n type: Bash\n configuration:\n inputSteps:\n - name: step_A\n execution:\n onExecute:\n - echo ""Executing step_B""\n - printenv\n \n - name: step_C\n type: Bash\n configuration:\n inputSteps:\n - name: step_A\n status:\n - failure\n - error\n - timeout\n execution:\n onExecute:\n - echo ""Executing step_C""\n - printenv\n[/code]\n##### Example 2\nIn this example, Step S is triggered if Step Q succeeds and Step R fails. However, if both Step Q and Step R succeed or fail during the run, Step S is not triggered and it is skipped. \n **YAML**\n[code]\n - name: step_S\n type: Bash\n configuration:\n inputSteps:\n - name: step_Q\n status:\n - success\n - name: step_R\n status:\n - failure\n execution:\n onExecute:\n - echo ""Executing step_S""\n - printenv\n[/code]\n##### Example 3\nIn this example, Step O is triggered if Step M succeeds and Step N fails. However, since Step N is not part of the current run, Step O is triggered when Step M succeeds and Step N's status is ignored. \n **YAML**\n[code]\n - name: step_O\n type: Bash\n configuration:\n inputSteps:\n - name: step_M\n status:\n - success\n - name: step_N\n status:\n - failure\n execution:\n onExecute:\n - echo ""Executing step_O""\n - printenv\n[/code]\n##### Example 4 - Using Environment Variable\nThe `step_<inputStepName>_statusName`, which is an environment variable that is automatically made available at runtime, can be used in conjunction with conditional workflows. This `step_<inputStepName>_statusName` environment variable is useful for fetching the status of any input step, especially when working with Jenkins.\n **YAML**\n[code]\n resources:\n - name: script_gh\n type: GitRepo\n configuration:\n path: jfrog/sample-script\n gitProvider: myGithub\n branches:\n include: ^{{gitBranch}}$\n \n pipelines: \n - name: simple_jenkins_demo\n steps:\n - name: jenkins\n type: Jenkins\n configuration:\n inputResources:\n - name: script_gh\n jenkinsJobName: testPipeline\n integrations:\n - name: myJenkins\n \n - name: step_A\n type: Bash\n configuration:\n inputSteps:\n - name: jenkins\n status:\n - failure\n - error\n - timeout\n execution:\n onExecute:\n - echo ""Executing step_A""\n - if [ $step_jenkins_statusName == ""failure"" ]; then echo ""Do something""; fi\n - if [ $step_jenkins_statusName == ""error"" ]; then echo ""Do something else""; fi\n \n - name: simple_conditional_B\n type: Bash\n configuration:\n inputSteps:\n - name: jenkins\n status:\n - failure\n - error\n execution:\n onExecute:\n - echo ""Executing simple_conditional_B""\n - printenv\n \n[/code]\n#### allowFailure Examples\n##### Example 1\nStep1 is configured for success and step2 for failure. Step2 is allowed to run when step1 fails and the final status of the run is success.\n[code]\n pipelines:\n - name: PIPE_9455_Workflow_03\n steps:\n - name: step1\n type: Bash\n execution:\n onExecute:\n - echo 'step1'\n - name: step2\n type: Bash\n configuration:\n allowFailure: true\n inputSteps:\n - name: step1\n status:\n - success\n - error\n - failure\n - timeout\n execution:\n onExecute:\n - echo 'success'\n - exit 1\n[/code]\n##### Example 2\nStep1 is configured for failure and step2 for success. Step2 is allowed to run when step1 fails and the final status of the run is success.\n[code]\n pipelines:\n - name: PIPE_9455_Workflow_05\n steps:\n - name: step1\n type: Bash\n configuration:\n allowFailure: true\n execution:\n onExecute:\n - echo 'step1'\n - exit 1\n - name: step2\n type: Bash\n configuration:\n inputSteps:\n - name: step1\n status:\n - success\n - error\n - failure\n - timeout\n execution:\n onExecute:\n - echo 'success'\n[/code]\n##### Example 3\nStep1 is configured for success and step2 for failure. When triggered, the final status of the run is failure.\n[code]\n pipelines:\n - name: PIPE_9455_Workflow_03\n steps:\n - name: step1\n type: Bash\n configuration:\n allowFailure: true\n execution:\n onExecute:\n - echo 'step1'\n - name: step2\n type: Bash\n configuration:\n inputSteps:\n - name: step1\n status:\n - success\n - error\n - failure\n - timeout\n execution:\n onExecute:\n - echo 'failure'\n - exit 1\n[/code]\n\n Document url for reference - https://jfrog.com/help/r/jfrog-pipelines-documentation/pipelines-steps"


In [44]:
# We have extarcted the data and we can save as CSV
final_data.to_csv('./jFrog_pipline.csv')

We have extarcted our data and we can use `jFrog_pipline.csv` for training