diff --git a/python-dsl/LICENSE b/python-dsl/LICENSE new file mode 100644 index 00000000..0ad25db4 --- /dev/null +++ b/python-dsl/LICENSE @@ -0,0 +1,661 @@ + GNU AFFERO GENERAL PUBLIC LICENSE + Version 3, 19 November 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU Affero General Public License is a free, copyleft license for +software and other kinds of works, specifically designed to ensure +cooperation with the community in the case of network server software. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +our General Public Licenses are intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + Developers that use our General Public Licenses protect your rights +with two steps: (1) assert copyright on the software, and (2) offer +you this License which gives you legal permission to copy, distribute +and/or modify the software. + + A secondary benefit of defending all users' freedom is that +improvements made in alternate versions of the program, if they +receive widespread use, become available for other developers to +incorporate. Many developers of free software are heartened and +encouraged by the resulting cooperation. However, in the case of +software used on network servers, this result may fail to come about. +The GNU General Public License permits making a modified version and +letting the public access it on a server without ever releasing its +source code to the public. + + The GNU Affero General Public License is designed specifically to +ensure that, in such cases, the modified source code becomes available +to the community. It requires the operator of a network server to +provide the source code of the modified version running there to the +users of that server. Therefore, public use of a modified version, on +a publicly accessible server, gives the public access to the source +code of the modified version. + + An older license, called the Affero General Public License and +published by Affero, was designed to accomplish similar goals. This is +a different license, not a version of the Affero GPL, but Affero has +released a new version of the Affero GPL which permits relicensing under +this license. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU Affero General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Remote Network Interaction; Use with the GNU General Public License. + + Notwithstanding any other provision of this License, if you modify the +Program, your modified version must prominently offer all users +interacting with it remotely through a computer network (if your version +supports such interaction) an opportunity to receive the Corresponding +Source of your version by providing access to the Corresponding Source +from a network server at no charge, through some standard or customary +means of facilitating copying of software. This Corresponding Source +shall include the Corresponding Source for any work covered by version 3 +of the GNU General Public License that is incorporated pursuant to the +following paragraph. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the work with which it is combined will remain governed by version +3 of the GNU General Public License. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU Affero General Public License from time to time. Such new versions +will be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU Affero General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU Affero General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU Affero General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +state the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU Affero General Public License as published + by the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Affero General Public License for more details. + + You should have received a copy of the GNU Affero General Public License + along with this program. If not, see . + +Also add information on how to contact you by electronic and paper mail. + + If your software can interact with users remotely through a computer +network, you should also make sure that it provides a way for users to +get its source. For example, if your program is a web application, its +interface could display a "Source" link that leads users to an archive +of the code. There are many ways you could offer source, and different +solutions will be better for different programs; see section 13 for the +specific requirements. + + You should also get your employer (if you work as a programmer) or school, +if any, to sign a "copyright disclaimer" for the program, if necessary. +For more information on this, and how to apply and follow the GNU AGPL, see +. diff --git a/python-dsl/MANIFEST.in b/python-dsl/MANIFEST.in new file mode 100644 index 00000000..d9bf267b --- /dev/null +++ b/python-dsl/MANIFEST.in @@ -0,0 +1,11 @@ +include LICENSE +include README.md +include pyproject.toml +recursive-include codepathfinder *.py +recursive-exclude tests * +recursive-exclude htmlcov * +recursive-exclude .pytest_cache * +recursive-exclude .mypy_cache * +recursive-exclude .ruff_cache * +global-exclude *.pyc +global-exclude __pycache__ diff --git a/python-dsl/README.md b/python-dsl/README.md index 1a481d91..7b10aa76 100644 --- a/python-dsl/README.md +++ b/python-dsl/README.md @@ -1,6 +1,14 @@ # Code-Pathfinder Python DSL -Python DSL for defining security patterns in code-pathfinder. +Python DSL for defining security patterns in Code Pathfinder - an open-source security suite combining structural code analysis with AI-powered vulnerability detection. + +**Project Goals:** +- Real-time IDE integration bringing security insights directly into your editor +- AI-assisted analysis leveraging LLMs to understand context and identify vulnerabilities +- Unified workflow coverage from local development to CI/CD pipelines +- Flexible reporting supporting DefectDojo, GitHub Advanced Security, SARIF, and other platforms + +**Documentation**: https://codepathfinder.dev/ ## Installation @@ -8,203 +16,35 @@ Python DSL for defining security patterns in code-pathfinder. pip install codepathfinder ``` -## Quick Start - -```python -from codepathfinder import rule, calls, variable - -@rule(id="code-injection", severity="critical", cwe="CWE-94") -def detect_eval(): - """Detects dangerous code execution via eval/exec""" - return calls("eval", "exec") - -@rule(id="user-input", severity="high") -def detect_user_input(): - """Detects user input variables""" - return variable("user_input") -``` - -## Core Matchers - -### `calls(*patterns)` - -Matches function/method calls. +## Quick Example ```python -from codepathfinder import calls - -# Exact match -calls("eval") - -# Multiple patterns -calls("eval", "exec", "compile") - -# Wildcard patterns -calls("request.*") # Matches request.GET, request.POST, etc. -calls("*.execute") # Matches cursor.execute, conn.execute, etc. -``` - -### `variable(pattern)` - -Matches variable references. - -```python -from codepathfinder import variable - -# Exact match -variable("user_input") - -# Wildcard patterns -variable("user_*") # Matches user_input, user_data, etc. -variable("*_id") # Matches user_id, post_id, etc. -``` - -## Dataflow Analysis - -### `flows(from_sources, to_sinks, sanitized_by=None, propagates_through=None, scope="global")` - -Tracks tainted data flow from sources to sinks for OWASP Top 10 vulnerability detection. - -```python -from codepathfinder import flows, calls, propagates - -# SQL Injection -flows( - from_sources=calls("request.GET", "request.POST"), - to_sinks=calls("execute", "executemany"), - sanitized_by=calls("quote_sql"), - propagates_through=[ - propagates.assignment(), - propagates.function_args(), - ], - scope="global" -) - -# Command Injection -flows( - from_sources=calls("request.POST"), - to_sinks=calls("os.system", "subprocess.call"), - sanitized_by=calls("shlex.quote"), - propagates_through=[ - propagates.assignment(), - propagates.function_args(), - propagates.function_returns(), - ] -) - -# Path Traversal -flows( - from_sources=calls("request.GET"), - to_sinks=calls("open", "os.path.join"), - sanitized_by=calls("os.path.abspath"), - propagates_through=[propagates.assignment()], - scope="local" -) -``` +from codepathfinder import rule, flows, calls +from codepathfinder.presets import PropagationPresets -**Parameters:** -- `from_sources`: Source matcher(s) where taint originates (e.g., user input) -- `to_sinks`: Sink matcher(s) for dangerous functions -- `sanitized_by` (optional): Sanitizer matcher(s) that neutralize taint -- `propagates_through` (optional): List of propagation primitives (EXPLICIT!) -- `scope`: `"local"` (intra-procedural) or `"global"` (inter-procedural, default) - -### Propagation Primitives - -Propagation primitives define HOW taint flows through code: - -```python -from codepathfinder import propagates - -# Phase 1 (Available Now): -propagates.assignment() # x = tainted -propagates.function_args() # func(tainted) -propagates.function_returns() # return tainted -``` - -**Important:** Propagation is EXPLICIT - you must specify which primitives to enable. No defaults are applied. - -## Rule Decorator - -The `@rule` decorator marks functions as security rules with metadata. - -```python -from codepathfinder import rule, calls - -@rule( - id="sqli-001", - severity="critical", - cwe="CWE-89", - owasp="A03:2021" -) +@rule(id="sql-injection", severity="critical", cwe="CWE-89") def detect_sql_injection(): """Detects SQL injection vulnerabilities""" - return calls("execute", "executemany", "raw") -``` - -**Parameters:** -- `id` (str): Unique rule identifier -- `severity` (str): `critical` | `high` | `medium` | `low` -- `cwe` (str, optional): CWE identifier (e.g., "CWE-89") -- `owasp` (str, optional): OWASP category (e.g., "A03:2021") - -The function docstring becomes the rule description. - -## JSON IR Output - -Rules serialize to JSON Intermediate Representation (IR) for the Go executor: - -```python -from codepathfinder import rule, calls -import json - -@rule(id="test", severity="high") -def my_rule(): - return calls("eval") - -# Serialize to JSON IR -ir = my_rule.execute() -print(json.dumps(ir, indent=2)) + return flows( + from_sources=calls("request.GET", "request.POST"), + to_sinks=calls("execute", "executemany"), + sanitized_by=calls("quote_sql"), + propagates_through=PropagationPresets.standard(), + scope="global" + ) ``` -Output: -```json -{ - "rule": { - "id": "test", - "name": "my_rule", - "severity": "high", - "cwe": null, - "owasp": null, - "description": "" - }, - "matcher": { - "type": "call_matcher", - "patterns": ["eval"], - "wildcard": false, - "match_mode": "any" - } -} -``` - -## Development - -```bash -# Install with dev dependencies -pip install -e ".[dev]" - -# Run tests -pytest +## Features -# Format code -black codepathfinder/ tests/ +- **Matchers**: `calls()`, `variable()` for pattern matching +- **Dataflow Analysis**: `flows()` for source-to-sink taint tracking +- **Propagation**: Explicit propagation primitives (assignment, function args, returns) +- **Logic Operators**: `And()`, `Or()`, `Not()` for complex rules +- **JSON IR**: Serializes to JSON for Go executor integration -# Lint -ruff check codepathfinder/ tests/ +## Documentation -# Type check -mypy codepathfinder/ -``` +For detailed documentation, visit https://codepathfinder.dev/ ## Requirements @@ -213,4 +53,4 @@ mypy codepathfinder/ ## License -MIT +AGPL-3.0 - GNU Affero General Public License v3 diff --git a/python-dsl/pyproject.toml b/python-dsl/pyproject.toml index 397222cb..02ccb1ba 100644 --- a/python-dsl/pyproject.toml +++ b/python-dsl/pyproject.toml @@ -8,7 +8,30 @@ version = "1.0.0" description = "Python DSL for code-pathfinder security patterns" readme = "README.md" requires-python = ">=3.8" -license = {text = "MIT"} +license = {text = "AGPL-3.0"} +authors = [{name = "code-pathfinder contributors"}] +classifiers = [ + "Development Status :: 4 - Beta", + "Intended Audience :: Developers", + "License :: OSI Approved :: GNU Affero General Public License v3", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.8", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "Topic :: Security", + "Topic :: Software Development :: Testing", +] + +[project.optional-dependencies] +dev = [ + "pytest>=7.0.0", + "pytest-cov>=4.0.0", + "black>=23.0.0", + "mypy>=1.0.0", + "ruff>=0.1.0", +] [tool.pytest.ini_options] testpaths = ["tests"] diff --git a/python-dsl/setup.py b/python-dsl/setup.py index 650725d6..677b895c 100644 --- a/python-dsl/setup.py +++ b/python-dsl/setup.py @@ -25,6 +25,7 @@ url="https://github.com/shivasurya/code-pathfinder", packages=find_packages(exclude=["tests", "tests.*"]), python_requires=">=3.8", + license="AGPL-3.0", install_requires=[ # No external dependencies (stdlib only!) ], @@ -40,7 +41,7 @@ classifiers=[ "Development Status :: 4 - Beta", "Intended Audience :: Developers", - "License :: OSI Approved :: MIT License", + "License :: OSI Approved :: GNU Affero General Public License v3", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", diff --git a/sourcecode-parser/cmd/ci.go b/sourcecode-parser/cmd/ci.go index 3a2fa798..b935c6cb 100644 --- a/sourcecode-parser/cmd/ci.go +++ b/sourcecode-parser/cmd/ci.go @@ -1,19 +1,270 @@ package cmd import ( + "encoding/json" "fmt" + "log" + "os" + sarif "github.com/owenrumney/go-sarif/v2/sarif" + "github.com/shivasurya/code-pathfinder/sourcecode-parser/dsl" + "github.com/shivasurya/code-pathfinder/sourcecode-parser/graph" + "github.com/shivasurya/code-pathfinder/sourcecode-parser/graph/callgraph" "github.com/spf13/cobra" ) var ciCmd = &cobra.Command{ Use: "ci", - Short: "CI mode - Python DSL implementation in progress", + Short: "CI mode with SARIF or JSON output for CI/CD integration", + Long: `CI mode for integrating security scans into CI/CD pipelines. + +Outputs results in SARIF or JSON format for consumption by CI tools. + +Examples: + # Generate SARIF report + pathfinder ci --rules rules/owasp_top10.py --project . --output sarif > results.sarif + + # Generate JSON report + pathfinder ci --rules rules/owasp_top10.py --project . --output json > results.json`, RunE: func(cmd *cobra.Command, args []string) error { - return fmt.Errorf("CI command not yet implemented in new architecture") + rulesPath, _ := cmd.Flags().GetString("rules") + projectPath, _ := cmd.Flags().GetString("project") + outputFormat, _ := cmd.Flags().GetString("output") + + if rulesPath == "" { + return fmt.Errorf("--rules flag is required") + } + + if projectPath == "" { + return fmt.Errorf("--project flag is required") + } + + if outputFormat != "sarif" && outputFormat != "json" { + return fmt.Errorf("--output must be 'sarif' or 'json'") + } + + // Build code graph (AST) + log.Printf("Building code graph from %s...\n", projectPath) + codeGraph := graph.Initialize(projectPath) + if len(codeGraph.Nodes) == 0 { + return fmt.Errorf("no source files found in project") + } + log.Printf("Code graph built: %d nodes\n", len(codeGraph.Nodes)) + + // Build module registry + log.Printf("Building module registry...\n") + registry, err := callgraph.BuildModuleRegistry(projectPath) + if err != nil { + log.Printf("Warning: failed to build module registry: %v\n", err) + registry = callgraph.NewModuleRegistry() + } + + // Build callgraph + log.Printf("Building callgraph...\n") + cg, err := callgraph.BuildCallGraph(codeGraph, registry, projectPath) + if err != nil { + return fmt.Errorf("failed to build callgraph: %w", err) + } + log.Printf("Callgraph built: %d functions, %d call sites\n", + len(cg.Functions), countTotalCallSites(cg)) + + // Load Python DSL rules + log.Printf("Loading rules from %s...\n", rulesPath) + loader := dsl.NewRuleLoader(rulesPath) + rules, err := loader.LoadRules() + if err != nil { + return fmt.Errorf("failed to load rules: %w", err) + } + log.Printf("Loaded %d rules\n", len(rules)) + + // Execute rules against callgraph + log.Printf("Running security scan...\n") + allDetections := make(map[string][]dsl.DataflowDetection) + totalDetections := 0 + for _, rule := range rules { + detections, err := loader.ExecuteRule(&rule, cg) + if err != nil { + log.Printf("Error executing rule %s: %v\n", rule.Rule.ID, err) + continue + } + + if len(detections) > 0 { + allDetections[rule.Rule.ID] = detections + totalDetections += len(detections) + } + } + + log.Printf("Scan complete. Found %d vulnerabilities.\n", totalDetections) + log.Printf("Generating %s output...\n", outputFormat) + + // Generate output + if outputFormat == "sarif" { + return generateSARIFOutput(rules, allDetections) + } + return generateJSONOutput(rules, allDetections) }, } +func generateSARIFOutput(rules []dsl.RuleIR, allDetections map[string][]dsl.DataflowDetection) error { + report, err := sarif.New(sarif.Version210) + if err != nil { + return fmt.Errorf("failed to create SARIF report: %w", err) + } + + run := sarif.NewRunWithInformationURI("Code Pathfinder", "https://github.com/shivasurya/code-pathfinder") + + // Add all rules to the run + for _, rule := range rules { + // Create full description with CWE and OWASP info + fullDesc := rule.Rule.Description + if rule.Rule.CWE != "" || rule.Rule.OWASP != "" { + fullDesc += " (" + if rule.Rule.CWE != "" { + fullDesc += rule.Rule.CWE + } + if rule.Rule.OWASP != "" { + if rule.Rule.CWE != "" { + fullDesc += ", " + } + fullDesc += rule.Rule.OWASP + } + fullDesc += ")" + } + + sarifRule := run.AddRule(rule.Rule.ID). + WithDescription(fullDesc). + WithName(rule.Rule.Name) + + // Map severity to SARIF level + level := "warning" + switch rule.Rule.Severity { + case "critical", "high": + level = "error" + case "medium": + level = "warning" + case "low": + level = "note" + } + sarifRule.WithDefaultConfiguration(sarif.NewReportingConfiguration().WithLevel(level)) + } + + // Add detections as results + for _, rule := range rules { + detections, ok := allDetections[rule.Rule.ID] + if !ok { + continue + } + + for _, detection := range detections { + // Create detailed message + message := fmt.Sprintf("%s in %s", rule.Rule.Description, detection.FunctionFQN) + if detection.SinkCall != "" { + message += fmt.Sprintf(" (sink: %s, confidence: %.0f%%)", detection.SinkCall, detection.Confidence*100) + } + + result := run.CreateResultForRule(rule.Rule.ID). + WithMessage(sarif.NewTextMessage(message)) + + // Add location + if detection.FunctionFQN != "" { + location := sarif.NewLocation(). + WithPhysicalLocation( + sarif.NewPhysicalLocation(). + WithRegion( + sarif.NewRegion(). + WithStartLine(detection.SinkLine). + WithEndLine(detection.SinkLine), + ), + ) + + result.AddLocation(location) + } + + // Note: Additional detection info (functionFQN, sinkCall, etc.) is included in the message + // SARIF v2 spec doesn't have a straightforward way to add custom properties to results + } + } + + report.AddRun(run) + + // Write to stdout + sarifJSON, err := json.MarshalIndent(report, "", " ") + if err != nil { + return fmt.Errorf("failed to marshal SARIF: %w", err) + } + + fmt.Println(string(sarifJSON)) + return nil +} + +func generateJSONOutput(rules []dsl.RuleIR, allDetections map[string][]dsl.DataflowDetection) error { + output := make(map[string]interface{}) + output["tool"] = "Code Pathfinder" + output["version"] = Version + + results := []map[string]interface{}{} + for _, rule := range rules { + detections, ok := allDetections[rule.Rule.ID] + if !ok { + continue + } + + for _, detection := range detections { + result := map[string]interface{}{ + "ruleId": rule.Rule.ID, + "ruleName": rule.Rule.Name, + "severity": rule.Rule.Severity, + "cwe": rule.Rule.CWE, + "owasp": rule.Rule.OWASP, + "description": rule.Rule.Description, + "functionFQN": detection.FunctionFQN, + "sinkLine": detection.SinkLine, + "sinkCall": detection.SinkCall, + "scope": detection.Scope, + "confidence": detection.Confidence, + } + + if detection.SourceLine > 0 { + result["sourceLine"] = detection.SourceLine + } + + if detection.TaintedVar != "" { + result["taintedVar"] = detection.TaintedVar + } + + results = append(results, result) + } + } + + output["results"] = results + output["summary"] = map[string]interface{}{ + "totalVulnerabilities": len(results), + "rulesExecuted": len(rules), + } + + jsonOutput, err := json.MarshalIndent(output, "", " ") + if err != nil { + return fmt.Errorf("failed to marshal JSON: %w", err) + } + + fmt.Println(string(jsonOutput)) + + // Exit with error code if vulnerabilities found + if len(results) > 0 { + osExit(1) + } + + return nil +} + +// Variable to allow mocking os.Exit in tests. +var osExit = os.Exit + func init() { rootCmd.AddCommand(ciCmd) + ciCmd.Flags().StringP("rules", "r", "", "Path to Python DSL rules file (required)") + ciCmd.Flags().StringP("project", "p", "", "Path to project directory to scan (required)") + ciCmd.Flags().StringP("output", "o", "sarif", "Output format: sarif or json (default: sarif)") + ciCmd.MarkFlagRequired("rules") + ciCmd.MarkFlagRequired("project") } diff --git a/sourcecode-parser/cmd/ci_test.go b/sourcecode-parser/cmd/ci_test.go new file mode 100644 index 00000000..a5889958 --- /dev/null +++ b/sourcecode-parser/cmd/ci_test.go @@ -0,0 +1,441 @@ +package cmd + +import ( + "bytes" + "encoding/json" + "io" + "os" + "testing" + + "github.com/shivasurya/code-pathfinder/sourcecode-parser/dsl" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +// Helper function to create test rules. +func createTestRule(id, name, severity, cwe, owasp, description string) dsl.RuleIR { + rule := dsl.RuleIR{} + rule.Rule.ID = id + rule.Rule.Name = name + rule.Rule.Severity = severity + rule.Rule.CWE = cwe + rule.Rule.OWASP = owasp + rule.Rule.Description = description + return rule +} + +func TestGenerateSARIFOutput(t *testing.T) { + t.Run("generates valid SARIF output with detections", func(t *testing.T) { + // Capture stdout + old := os.Stdout + r, w, _ := os.Pipe() + os.Stdout = w + + rules := []dsl.RuleIR{ + createTestRule("sql-injection", "SQL Injection", "critical", "CWE-89", "A03:2021", "Detects SQL injection vulnerabilities"), + } + + allDetections := map[string][]dsl.DataflowDetection{ + "sql-injection": { + { + FunctionFQN: "test.vulnerable", + SourceLine: 10, + SinkLine: 20, + SinkCall: "execute", + Confidence: 0.9, + Scope: "local", + }, + }, + } + + err := generateSARIFOutput(rules, allDetections) + require.NoError(t, err) + + // Restore stdout + w.Close() + os.Stdout = old + var buf bytes.Buffer + io.Copy(&buf, r) + output := buf.String() + + // Parse JSON to verify structure + var sarifReport map[string]interface{} + err = json.Unmarshal([]byte(output), &sarifReport) + require.NoError(t, err) + + // Verify SARIF structure + assert.Equal(t, "2.1.0", sarifReport["version"]) + runs := sarifReport["runs"].([]interface{}) + assert.Len(t, runs, 1) + + run := runs[0].(map[string]interface{}) + tool := run["tool"].(map[string]interface{}) + driver := tool["driver"].(map[string]interface{}) + assert.Equal(t, "Code Pathfinder", driver["name"]) + + // Verify rule is included + rules_array := driver["rules"].([]interface{}) + assert.Len(t, rules_array, 1) + rule := rules_array[0].(map[string]interface{}) + assert.Equal(t, "sql-injection", rule["id"]) + assert.Equal(t, "SQL Injection", rule["name"]) + + // Check description field (could be "fullDescription" or "shortDescription") + if fullDesc, ok := rule["fullDescription"].(map[string]interface{}); ok { + assert.Contains(t, fullDesc["text"], "Detects SQL injection vulnerabilities") + } else if shortDesc, ok := rule["shortDescription"].(map[string]interface{}); ok { + assert.Contains(t, shortDesc["text"], "Detects SQL injection vulnerabilities") + } + + // Verify result is included + results := run["results"].([]interface{}) + assert.Len(t, results, 1) + result := results[0].(map[string]interface{}) + assert.Equal(t, "sql-injection", result["ruleId"]) + message := result["message"].(map[string]interface{}) + assert.Contains(t, message["text"], "test.vulnerable") + assert.Contains(t, message["text"], "execute") + assert.Contains(t, message["text"], "90%") + }) + + t.Run("generates SARIF with multiple rules and detections", func(t *testing.T) { + old := os.Stdout + r, w, _ := os.Pipe() + os.Stdout = w + + rules := []dsl.RuleIR{ + createTestRule("rule1", "Rule 1", "high", "CWE-1", "", "Rule 1 description"), + createTestRule("rule2", "Rule 2", "medium", "", "A01:2021", "Rule 2 description"), + } + + allDetections := map[string][]dsl.DataflowDetection{ + "rule1": { + {FunctionFQN: "test.func1", SinkLine: 10, Confidence: 0.8, Scope: "local"}, + }, + "rule2": { + {FunctionFQN: "test.func2", SinkLine: 20, Confidence: 0.7, Scope: "global"}, + {FunctionFQN: "test.func3", SinkLine: 30, Confidence: 0.6, Scope: "local"}, + }, + } + + err := generateSARIFOutput(rules, allDetections) + require.NoError(t, err) + + w.Close() + os.Stdout = old + var buf bytes.Buffer + io.Copy(&buf, r) + output := buf.String() + + var sarifReport map[string]interface{} + err = json.Unmarshal([]byte(output), &sarifReport) + require.NoError(t, err) + + runs := sarifReport["runs"].([]interface{}) + run := runs[0].(map[string]interface{}) + + // Verify 2 rules + rules_array := run["tool"].(map[string]interface{})["driver"].(map[string]interface{})["rules"].([]interface{}) + assert.Len(t, rules_array, 2) + + // Verify 3 results total + results := run["results"].([]interface{}) + assert.Len(t, results, 3) + }) + + t.Run("generates SARIF with no detections", func(t *testing.T) { + old := os.Stdout + r, w, _ := os.Pipe() + os.Stdout = w + + rules := []dsl.RuleIR{ + createTestRule("clean-rule", "Clean Rule", "low", "", "", "No issues found"), + } + + allDetections := map[string][]dsl.DataflowDetection{} + + err := generateSARIFOutput(rules, allDetections) + require.NoError(t, err) + + w.Close() + os.Stdout = old + var buf bytes.Buffer + io.Copy(&buf, r) + output := buf.String() + + var sarifReport map[string]interface{} + err = json.Unmarshal([]byte(output), &sarifReport) + require.NoError(t, err) + + runs := sarifReport["runs"].([]interface{}) + run := runs[0].(map[string]interface{}) + + // Verify rule is included + rules_array := run["tool"].(map[string]interface{})["driver"].(map[string]interface{})["rules"].([]interface{}) + assert.Len(t, rules_array, 1) + + // Verify no results + results := run["results"].([]interface{}) + assert.Len(t, results, 0) + }) + + t.Run("maps severity levels correctly", func(t *testing.T) { + old := os.Stdout + r, w, _ := os.Pipe() + os.Stdout = w + + rules := []dsl.RuleIR{ + createTestRule("r1", "R1", "critical", "", "", "D1"), + createTestRule("r2", "R2", "high", "", "", "D2"), + createTestRule("r3", "R3", "medium", "", "", "D3"), + createTestRule("r4", "R4", "low", "", "", "D4"), + } + + err := generateSARIFOutput(rules, map[string][]dsl.DataflowDetection{}) + require.NoError(t, err) + + w.Close() + os.Stdout = old + var buf bytes.Buffer + io.Copy(&buf, r) + output := buf.String() + + var sarifReport map[string]interface{} + err = json.Unmarshal([]byte(output), &sarifReport) + require.NoError(t, err) + + runs := sarifReport["runs"].([]interface{}) + run := runs[0].(map[string]interface{}) + rules_array := run["tool"].(map[string]interface{})["driver"].(map[string]interface{})["rules"].([]interface{}) + + // Verify severity mappings + for _, r := range rules_array { + rule := r.(map[string]interface{}) + config := rule["defaultConfiguration"].(map[string]interface{}) + level := config["level"].(string) + + switch rule["id"].(string) { + case "r1", "r2": + assert.Equal(t, "error", level, "critical/high should map to error") + case "r3": + assert.Equal(t, "warning", level, "medium should map to warning") + case "r4": + assert.Equal(t, "note", level, "low should map to note") + } + } + }) +} + +func TestGenerateJSONOutput(t *testing.T) { + t.Run("generates valid JSON output with detections", func(t *testing.T) { + old := os.Stdout + r, w, _ := os.Pipe() + os.Stdout = w + + // Prevent os.Exit(1) in test + oldOSExit := osExit + exitCode := 0 + osExit = func(code int) { + exitCode = code + } + defer func() { osExit = oldOSExit }() + + rules := []dsl.RuleIR{ + createTestRule("xss-vuln", "XSS Vulnerability", "high", "CWE-79", "A03:2021", "Cross-site scripting vulnerability"), + } + + allDetections := map[string][]dsl.DataflowDetection{ + "xss-vuln": { + { + FunctionFQN: "web.render", + SourceLine: 5, + SinkLine: 15, + SinkCall: "innerHTML", + TaintedVar: "user_input", + Confidence: 0.85, + Scope: "local", + }, + }, + } + + err := generateJSONOutput(rules, allDetections) + require.NoError(t, err) + + w.Close() + os.Stdout = old + var buf bytes.Buffer + io.Copy(&buf, r) + output := buf.String() + + // Parse JSON + var jsonOutput map[string]interface{} + err = json.Unmarshal([]byte(output), &jsonOutput) + require.NoError(t, err) + + // Verify structure + assert.Equal(t, "Code Pathfinder", jsonOutput["tool"]) + assert.Equal(t, Version, jsonOutput["version"]) + + results := jsonOutput["results"].([]interface{}) + assert.Len(t, results, 1) + + result := results[0].(map[string]interface{}) + assert.Equal(t, "xss-vuln", result["ruleId"]) + assert.Equal(t, "XSS Vulnerability", result["ruleName"]) + assert.Equal(t, "high", result["severity"]) + assert.Equal(t, "CWE-79", result["cwe"]) + assert.Equal(t, "A03:2021", result["owasp"]) + assert.Equal(t, "Cross-site scripting vulnerability", result["description"]) + assert.Equal(t, "web.render", result["functionFQN"]) + assert.Equal(t, float64(5), result["sourceLine"]) + assert.Equal(t, float64(15), result["sinkLine"]) + assert.Equal(t, "innerHTML", result["sinkCall"]) + assert.Equal(t, "user_input", result["taintedVar"]) + assert.Equal(t, "local", result["scope"]) + assert.Equal(t, 0.85, result["confidence"]) + + summary := jsonOutput["summary"].(map[string]interface{}) + assert.Equal(t, float64(1), summary["totalVulnerabilities"]) + assert.Equal(t, float64(1), summary["rulesExecuted"]) + + // Verify os.Exit(1) was called + assert.Equal(t, 1, exitCode) + }) + + t.Run("generates JSON with multiple detections", func(t *testing.T) { + old := os.Stdout + r, w, _ := os.Pipe() + os.Stdout = w + + oldOSExit := osExit + osExit = func(code int) {} + defer func() { osExit = oldOSExit }() + + rules := []dsl.RuleIR{ + createTestRule("r1", "R1", "high", "CWE-1", "A01", "D1"), + createTestRule("r2", "R2", "medium", "CWE-2", "A02", "D2"), + } + + allDetections := map[string][]dsl.DataflowDetection{ + "r1": { + {FunctionFQN: "f1", SinkLine: 10, Confidence: 0.9, Scope: "local"}, + }, + "r2": { + {FunctionFQN: "f2", SinkLine: 20, Confidence: 0.8, Scope: "global"}, + {FunctionFQN: "f3", SinkLine: 30, Confidence: 0.7, Scope: "local"}, + }, + } + + err := generateJSONOutput(rules, allDetections) + require.NoError(t, err) + + w.Close() + os.Stdout = old + var buf bytes.Buffer + io.Copy(&buf, r) + output := buf.String() + + var jsonOutput map[string]interface{} + err = json.Unmarshal([]byte(output), &jsonOutput) + require.NoError(t, err) + + results := jsonOutput["results"].([]interface{}) + assert.Len(t, results, 3) + + summary := jsonOutput["summary"].(map[string]interface{}) + assert.Equal(t, float64(3), summary["totalVulnerabilities"]) + assert.Equal(t, float64(2), summary["rulesExecuted"]) + }) + + t.Run("generates JSON with no detections", func(t *testing.T) { + old := os.Stdout + r, w, _ := os.Pipe() + os.Stdout = w + + oldOSExit := osExit + exitCode := 0 + osExit = func(code int) { + exitCode = code + } + defer func() { osExit = oldOSExit }() + + rules := []dsl.RuleIR{ + createTestRule("clean", "Clean", "low", "", "", "No issues"), + } + + allDetections := map[string][]dsl.DataflowDetection{} + + err := generateJSONOutput(rules, allDetections) + require.NoError(t, err) + + w.Close() + os.Stdout = old + var buf bytes.Buffer + io.Copy(&buf, r) + output := buf.String() + + var jsonOutput map[string]interface{} + err = json.Unmarshal([]byte(output), &jsonOutput) + require.NoError(t, err) + + results := jsonOutput["results"].([]interface{}) + assert.Len(t, results, 0) + + summary := jsonOutput["summary"].(map[string]interface{}) + assert.Equal(t, float64(0), summary["totalVulnerabilities"]) + + // Verify os.Exit(1) was NOT called + assert.Equal(t, 0, exitCode) + }) + + t.Run("handles optional fields correctly", func(t *testing.T) { + old := os.Stdout + r, w, _ := os.Pipe() + os.Stdout = w + + oldOSExit := osExit + osExit = func(code int) {} + defer func() { osExit = oldOSExit }() + + rules := []dsl.RuleIR{ + createTestRule("minimal", "Minimal", "low", "", "", "Minimal detection"), + } + + allDetections := map[string][]dsl.DataflowDetection{ + "minimal": { + { + FunctionFQN: "func", + SinkLine: 10, + Confidence: 0.5, + Scope: "local", + // No SourceLine, SinkCall, or TaintedVar + }, + }, + } + + err := generateJSONOutput(rules, allDetections) + require.NoError(t, err) + + w.Close() + os.Stdout = old + var buf bytes.Buffer + io.Copy(&buf, r) + output := buf.String() + + var jsonOutput map[string]interface{} + err = json.Unmarshal([]byte(output), &jsonOutput) + require.NoError(t, err) + + results := jsonOutput["results"].([]interface{}) + result := results[0].(map[string]interface{}) + + // Verify optional fields are not present or empty + _, hasSourceLine := result["sourceLine"] + assert.False(t, hasSourceLine, "sourceLine should not be present when 0") + + _, hasTaintedVar := result["taintedVar"] + if hasTaintedVar { + assert.Equal(t, "", result["taintedVar"]) + } + }) +} diff --git a/sourcecode-parser/cmd/query.go b/sourcecode-parser/cmd/query.go index c8d667ed..efe0c3fe 100644 --- a/sourcecode-parser/cmd/query.go +++ b/sourcecode-parser/cmd/query.go @@ -2,18 +2,100 @@ package cmd import ( "fmt" + "log" + "github.com/shivasurya/code-pathfinder/sourcecode-parser/dsl" + "github.com/shivasurya/code-pathfinder/sourcecode-parser/graph" + "github.com/shivasurya/code-pathfinder/sourcecode-parser/graph/callgraph" "github.com/spf13/cobra" ) var queryCmd = &cobra.Command{ Use: "query", - Short: "Query mode - Python DSL implementation in progress", + Short: "Query code using Python DSL rules", + Long: `Query codebase using Python DSL security rules. + +Similar to scan but designed for ad-hoc queries and exploration. + +Examples: + # Query with a single rule + pathfinder query --rules my_rule.py --project /path/to/project + + # Query specific files + pathfinder query --rules rule.py --project /path/to/file.py`, RunE: func(cmd *cobra.Command, args []string) error { - return fmt.Errorf("Query command not yet implemented in new architecture") + rulesPath, _ := cmd.Flags().GetString("rules") + projectPath, _ := cmd.Flags().GetString("project") + + if rulesPath == "" { + return fmt.Errorf("--rules flag is required") + } + + if projectPath == "" { + return fmt.Errorf("--project flag is required") + } + + // Build code graph (AST) + log.Printf("Building code graph from %s...\n", projectPath) + codeGraph := graph.Initialize(projectPath) + if len(codeGraph.Nodes) == 0 { + return fmt.Errorf("no source files found in project") + } + log.Printf("Code graph built: %d nodes\n", len(codeGraph.Nodes)) + + // Build module registry + log.Printf("Building module registry...\n") + registry, err := callgraph.BuildModuleRegistry(projectPath) + if err != nil { + log.Printf("Warning: failed to build module registry: %v\n", err) + registry = callgraph.NewModuleRegistry() + } + + // Build callgraph + log.Printf("Building callgraph...\n") + cg, err := callgraph.BuildCallGraph(codeGraph, registry, projectPath) + if err != nil { + return fmt.Errorf("failed to build callgraph: %w", err) + } + log.Printf("Callgraph built: %d functions, %d call sites\n", + len(cg.Functions), countTotalCallSites(cg)) + + // Load Python DSL rules + log.Printf("Loading rules from %s...\n", rulesPath) + loader := dsl.NewRuleLoader(rulesPath) + rules, err := loader.LoadRules() + if err != nil { + return fmt.Errorf("failed to load rules: %w", err) + } + log.Printf("Loaded %d rules\n", len(rules)) + + // Execute rules against callgraph + log.Printf("\n=== Query Results ===\n") + totalDetections := 0 + for _, rule := range rules { + detections, err := loader.ExecuteRule(&rule, cg) + if err != nil { + log.Printf("Error executing rule %s: %v\n", rule.Rule.ID, err) + continue + } + + if len(detections) > 0 { + printDetections(rule, detections) + totalDetections += len(detections) + } + } + + log.Printf("\n=== Query Complete ===\n") + log.Printf("Total matches: %d\n", totalDetections) + + return nil }, } func init() { rootCmd.AddCommand(queryCmd) + queryCmd.Flags().StringP("rules", "r", "", "Path to Python DSL rules file (required)") + queryCmd.Flags().StringP("project", "p", "", "Path to project directory to query (required)") + queryCmd.MarkFlagRequired("rules") + queryCmd.MarkFlagRequired("project") } diff --git a/sourcecode-parser/cmd/query_test.go b/sourcecode-parser/cmd/query_test.go index 4cfc411d..3f243414 100644 --- a/sourcecode-parser/cmd/query_test.go +++ b/sourcecode-parser/cmd/query_test.go @@ -6,14 +6,15 @@ import ( "github.com/stretchr/testify/assert" ) -func TestQueryCommandStub(t *testing.T) { +func TestQueryCommand(t *testing.T) { cmd := queryCmd assert.NotNil(t, cmd) assert.Equal(t, "query", cmd.Use) + assert.Equal(t, "Query code using Python DSL rules", cmd.Short) - // Test execution returns error for unimplemented command + // Test execution returns error when required flags are missing err := cmd.RunE(cmd, []string{}) assert.Error(t, err) - assert.Contains(t, err.Error(), "not yet implemented") + assert.Contains(t, err.Error(), "required") } diff --git a/sourcecode-parser/cmd/scan.go b/sourcecode-parser/cmd/scan.go index 9514847e..f734c3be 100644 --- a/sourcecode-parser/cmd/scan.go +++ b/sourcecode-parser/cmd/scan.go @@ -2,18 +2,142 @@ package cmd import ( "fmt" + "log" + "os" + "path/filepath" + "github.com/shivasurya/code-pathfinder/sourcecode-parser/dsl" + "github.com/shivasurya/code-pathfinder/sourcecode-parser/graph" + "github.com/shivasurya/code-pathfinder/sourcecode-parser/graph/callgraph" "github.com/spf13/cobra" ) var scanCmd = &cobra.Command{ Use: "scan", - Short: "Scan mode - Python DSL implementation in progress", + Short: "Scan code for security vulnerabilities using Python DSL rules", + Long: `Scan codebase using Python DSL security rules. + +Examples: + # Scan with OWASP rules + pathfinder scan --rules rules/owasp_top10.py --project /path/to/project + + # Scan with custom rules + pathfinder scan --rules my_rules.py --project .`, RunE: func(cmd *cobra.Command, args []string) error { - return fmt.Errorf("Scan command not yet implemented in new architecture") + rulesPath, _ := cmd.Flags().GetString("rules") + projectPath, _ := cmd.Flags().GetString("project") + + if rulesPath == "" { + return fmt.Errorf("--rules flag is required") + } + + if projectPath == "" { + return fmt.Errorf("--project flag is required") + } + + // Convert project path to absolute path to ensure consistency + absProjectPath, err := filepath.Abs(projectPath) + if err != nil { + return fmt.Errorf("failed to resolve project path: %w", err) + } + projectPath = absProjectPath + + // Step 1: Build code graph (AST) + log.Printf("Building code graph from %s...\n", projectPath) + codeGraph := graph.Initialize(projectPath) + if len(codeGraph.Nodes) == 0 { + return fmt.Errorf("no source files found in project") + } + log.Printf("Code graph built: %d nodes\n", len(codeGraph.Nodes)) + + // Step 2: Build module registry + log.Printf("Building module registry...\n") + registry, err := callgraph.BuildModuleRegistry(projectPath) + if err != nil { + log.Printf("Warning: failed to build module registry: %v\n", err) + // Create empty registry as fallback + registry = callgraph.NewModuleRegistry() + } + + // Step 3: Build callgraph + log.Printf("Building callgraph...\n") + cg, err := callgraph.BuildCallGraph(codeGraph, registry, projectPath) + if err != nil { + return fmt.Errorf("failed to build callgraph: %w", err) + } + log.Printf("Callgraph built: %d functions, %d call sites\n", + len(cg.Functions), countTotalCallSites(cg)) + + // Step 4: Load Python DSL rules + log.Printf("Loading rules from %s...\n", rulesPath) + loader := dsl.NewRuleLoader(rulesPath) + rules, err := loader.LoadRules() + if err != nil { + return fmt.Errorf("failed to load rules: %w", err) + } + log.Printf("Loaded %d rules\n", len(rules)) + + // Step 5: Execute rules against callgraph + log.Printf("\n=== Running Security Scan ===\n") + totalDetections := 0 + for _, rule := range rules { + detections, err := loader.ExecuteRule(&rule, cg) + if err != nil { + log.Printf("Error executing rule %s: %v\n", rule.Rule.ID, err) + continue + } + + if len(detections) > 0 { + printDetections(rule, detections) + totalDetections += len(detections) + } + } + + // Step 6: Print summary + log.Printf("\n=== Scan Complete ===\n") + log.Printf("Total vulnerabilities found: %d\n", totalDetections) + + if totalDetections > 0 { + os.Exit(1) // Exit with error code if vulnerabilities found + } + + return nil }, } +func countTotalCallSites(cg *callgraph.CallGraph) int { + total := 0 + for _, sites := range cg.CallSites { + total += len(sites) + } + return total +} + +func printDetections(rule dsl.RuleIR, detections []dsl.DataflowDetection) { + fmt.Printf("\n[%s] %s (%s)\n", rule.Rule.Severity, rule.Rule.ID, rule.Rule.Name) + fmt.Printf(" CWE: %s | OWASP: %s\n", rule.Rule.CWE, rule.Rule.OWASP) + fmt.Printf(" %s\n", rule.Rule.Description) + + for _, detection := range detections { + fmt.Printf("\n → %s:%d\n", detection.FunctionFQN, detection.SinkLine) + if detection.SourceLine > 0 { + fmt.Printf(" Source: line %d\n", detection.SourceLine) + } + if detection.SinkCall != "" { + fmt.Printf(" Sink: %s (line %d)\n", detection.SinkCall, detection.SinkLine) + } + if detection.TaintedVar != "" { + fmt.Printf(" Tainted variable: %s\n", detection.TaintedVar) + } + fmt.Printf(" Confidence: %.0f%%\n", detection.Confidence*100) + fmt.Printf(" Scope: %s\n", detection.Scope) + } +} + func init() { rootCmd.AddCommand(scanCmd) + scanCmd.Flags().StringP("rules", "r", "", "Path to Python DSL rules file (required)") + scanCmd.Flags().StringP("project", "p", "", "Path to project directory to scan (required)") + scanCmd.MarkFlagRequired("rules") + scanCmd.MarkFlagRequired("project") } diff --git a/sourcecode-parser/cmd/scan_test.go b/sourcecode-parser/cmd/scan_test.go new file mode 100644 index 00000000..0096d342 --- /dev/null +++ b/sourcecode-parser/cmd/scan_test.go @@ -0,0 +1,173 @@ +package cmd + +import ( + "bytes" + "io" + "os" + "testing" + + "github.com/shivasurya/code-pathfinder/sourcecode-parser/dsl" + "github.com/shivasurya/code-pathfinder/sourcecode-parser/graph/callgraph" + "github.com/stretchr/testify/assert" +) + +// Helper function to create test rules (duplicated from ci_test.go). +func createTestRuleScan(id, name, severity, cwe, owasp, description string) dsl.RuleIR { + rule := dsl.RuleIR{} + rule.Rule.ID = id + rule.Rule.Name = name + rule.Rule.Severity = severity + rule.Rule.CWE = cwe + rule.Rule.OWASP = owasp + rule.Rule.Description = description + return rule +} + +func TestCountTotalCallSites(t *testing.T) { + t.Run("counts call sites across all functions", func(t *testing.T) { + cg := callgraph.NewCallGraph() + cg.CallSites["func1"] = []callgraph.CallSite{ + {Target: "foo", Location: callgraph.Location{Line: 10}}, + {Target: "bar", Location: callgraph.Location{Line: 20}}, + } + cg.CallSites["func2"] = []callgraph.CallSite{ + {Target: "baz", Location: callgraph.Location{Line: 30}}, + } + + total := countTotalCallSites(cg) + assert.Equal(t, 3, total) + }) + + t.Run("returns zero for empty callgraph", func(t *testing.T) { + cg := callgraph.NewCallGraph() + total := countTotalCallSites(cg) + assert.Equal(t, 0, total) + }) + + t.Run("handles function with no call sites", func(t *testing.T) { + cg := callgraph.NewCallGraph() + cg.CallSites["func1"] = []callgraph.CallSite{} + total := countTotalCallSites(cg) + assert.Equal(t, 0, total) + }) +} + +func TestPrintDetections(t *testing.T) { + t.Run("prints detections with all fields", func(t *testing.T) { + // Capture stdout + old := os.Stdout + r, w, _ := os.Pipe() + os.Stdout = w + + rule := createTestRuleScan("test-rule", "Test Rule", "high", "CWE-89", "A03:2021", "Test SQL injection detection") + + detections := []dsl.DataflowDetection{ + { + FunctionFQN: "test.vulnerable_func", + SourceLine: 10, + SinkLine: 20, + SinkCall: "execute", + TaintedVar: "user_input", + Confidence: 0.9, + Scope: "local", + }, + } + + printDetections(rule, detections) + + // Restore stdout + w.Close() + os.Stdout = old + var buf bytes.Buffer + io.Copy(&buf, r) + output := buf.String() + + // Verify output contains expected information + assert.Contains(t, output, "[high] test-rule (Test Rule)") + assert.Contains(t, output, "CWE: CWE-89") + assert.Contains(t, output, "OWASP: A03:2021") + assert.Contains(t, output, "Test SQL injection detection") + assert.Contains(t, output, "test.vulnerable_func:20") + assert.Contains(t, output, "Source: line 10") + assert.Contains(t, output, "Sink: execute (line 20)") + assert.Contains(t, output, "Tainted variable: user_input") + assert.Contains(t, output, "Confidence: 90%") + assert.Contains(t, output, "Scope: local") + }) + + t.Run("prints detections without optional fields", func(t *testing.T) { + // Capture stdout + old := os.Stdout + r, w, _ := os.Pipe() + os.Stdout = w + + rule := createTestRuleScan("simple-rule", "Simple Rule", "medium", "", "", "Simple detection") + + detections := []dsl.DataflowDetection{ + { + FunctionFQN: "test.func", + SinkLine: 15, + Confidence: 0.5, + Scope: "global", + }, + } + + printDetections(rule, detections) + + // Restore stdout + w.Close() + os.Stdout = old + var buf bytes.Buffer + io.Copy(&buf, r) + output := buf.String() + + // Verify output + assert.Contains(t, output, "[medium] simple-rule (Simple Rule)") + assert.Contains(t, output, "test.func:15") + assert.Contains(t, output, "Confidence: 50%") + assert.Contains(t, output, "Scope: global") + // Should not contain optional fields + assert.NotContains(t, output, "Source: line 0") + assert.NotContains(t, output, "Sink: ") + assert.NotContains(t, output, "Tainted variable: ") + }) + + t.Run("prints multiple detections", func(t *testing.T) { + // Capture stdout + old := os.Stdout + r, w, _ := os.Pipe() + os.Stdout = w + + rule := createTestRuleScan("multi-rule", "Multi Rule", "critical", "CWE-79", "A03:2021", "XSS detection") + + detections := []dsl.DataflowDetection{ + { + FunctionFQN: "test.func1", + SinkLine: 10, + Confidence: 0.8, + Scope: "local", + }, + { + FunctionFQN: "test.func2", + SinkLine: 20, + Confidence: 0.7, + Scope: "local", + }, + } + + printDetections(rule, detections) + + // Restore stdout + w.Close() + os.Stdout = old + var buf bytes.Buffer + io.Copy(&buf, r) + output := buf.String() + + // Verify both detections are printed + assert.Contains(t, output, "test.func1:10") + assert.Contains(t, output, "test.func2:20") + assert.Contains(t, output, "Confidence: 80%") + assert.Contains(t, output, "Confidence: 70%") + }) +} diff --git a/sourcecode-parser/dsl/dataflow_executor.go b/sourcecode-parser/dsl/dataflow_executor.go index 2010c286..3757a60a 100644 --- a/sourcecode-parser/dsl/dataflow_executor.go +++ b/sourcecode-parser/dsl/dataflow_executor.go @@ -1,7 +1,6 @@ package dsl import ( - "log" "strings" "github.com/shivasurya/code-pathfinder/sourcecode-parser/graph/callgraph" @@ -30,63 +29,60 @@ func (e *DataflowExecutor) Execute() []DataflowDetection { } // executeLocal performs intra-procedural taint analysis. -// REUSES existing AnalyzeIntraProceduralTaint() from callgraph/taint.go. +// NOTE: This is a simplified implementation that checks for taint flows +// based on call site patterns rather than full dataflow analysis. +// Full taint analysis integration requires re-running analysis with DSL patterns. func (e *DataflowExecutor) executeLocal() []DataflowDetection { detections := []DataflowDetection{} - // Convert IR patterns to strings for existing API + // Convert IR patterns to strings sourcePatterns := e.extractPatterns(e.IR.Sources) sinkPatterns := e.extractPatterns(e.IR.Sinks) sanitizerPatterns := e.extractPatterns(e.IR.Sanitizers) - // Find all source and sink call sites + // Find call sites matching sources and sinks sourceCalls := e.findMatchingCalls(sourcePatterns) sinkCalls := e.findMatchingCalls(sinkPatterns) + sanitizerCalls := e.findMatchingCalls(sanitizerPatterns) - // For each function that has both sources and sinks - functionsToAnalyze := e.findFunctionsWithSourcesAndSinks(sourceCalls, sinkCalls) + // For local scope, check if source and sink are in the same function + for _, source := range sourceCalls { + for _, sink := range sinkCalls { + // Only detect within same function for local scope + if source.FunctionFQN != sink.FunctionFQN { + continue + } - for _, functionFQN := range functionsToAnalyze { - // Call EXISTING intra-procedural analysis - detection := e.analyzeFunction(functionFQN, sourcePatterns, sinkPatterns, sanitizerPatterns) - if detection != nil { - detections = append(detections, *detection) - } - } + // Check if there's a sanitizer in between (same function) + hasSanitizer := false + for _, sanitizer := range sanitizerCalls { + if sanitizer.FunctionFQN == source.FunctionFQN { + // If sanitizer is between source and sink, mark as sanitized + if (sanitizer.Line > source.Line && sanitizer.Line < sink.Line) || + (sanitizer.Line > sink.Line && sanitizer.Line < source.Line) { + hasSanitizer = true + break + } + } + } - return detections -} + // Create detection + detection := DataflowDetection{ + FunctionFQN: source.FunctionFQN, + SourceLine: source.Line, + SinkLine: sink.Line, + TaintedVar: "", // Not tracking variable names in this simplified version + SinkCall: sink.CallSite.Target, + Confidence: 0.7, // Medium confidence for pattern-based detection + Sanitized: hasSanitizer, + Scope: "local", + } -// analyzeFunction calls the EXISTING checkIntraProceduralTaint logic. -// -//nolint:unparam // Parameters will be used in future PRs -func (e *DataflowExecutor) analyzeFunction( - functionFQN string, - sourcePatterns []string, - sinkPatterns []string, - sanitizerPatterns []string, -) *DataflowDetection { - // Get function node - funcNode, ok := e.CallGraph.Functions[functionFQN] - if !ok { - return nil + detections = append(detections, detection) + } } - // TODO: Full integration requires AST parsing infrastructure - // For now, this is a placeholder that demonstrates the integration pattern - // The actual implementation would: - // 1. Parse the source file to get AST - // 2. Find the function node in the AST - // 3. Call ExtractStatements(filePath, sourceCode, functionNode) - // 4. Build def-use chains - // 5. Call AnalyzeIntraProceduralTaint - // 6. Convert results to DataflowDetection - - log.Printf("Would analyze function %s in file %s", functionFQN, funcNode.File) - - // Placeholder: return nil for now - // Real implementation will be completed in future PRs - return nil + return detections } // executeGlobal performs inter-procedural taint analysis. diff --git a/sourcecode-parser/dsl/dataflow_executor_test.go b/sourcecode-parser/dsl/dataflow_executor_test.go index eb66a3fe..561fff7d 100644 --- a/sourcecode-parser/dsl/dataflow_executor_test.go +++ b/sourcecode-parser/dsl/dataflow_executor_test.go @@ -42,6 +42,164 @@ func TestDataflowExecutor_Local(t *testing.T) { assert.Contains(t, functions, "test.vulnerable") }) + + t.Run("executes local analysis and finds detections", func(t *testing.T) { + cg := callgraph.NewCallGraph() + cg.CallSites["test.dangerous"] = []callgraph.CallSite{ + { + Target: "request.POST", + Location: callgraph.Location{File: "test.py", Line: 5}, + }, + { + Target: "execute", + Location: callgraph.Location{File: "test.py", Line: 10}, + }, + } + + ir := &DataflowIR{ + Sources: []CallMatcherIR{{Patterns: []string{"request.POST"}}}, + Sinks: []CallMatcherIR{{Patterns: []string{"execute"}}}, + Sanitizers: []CallMatcherIR{}, + Scope: "local", + } + + executor := NewDataflowExecutor(ir, cg) + detections := executor.executeLocal() + + assert.Len(t, detections, 1) + assert.Equal(t, "test.dangerous", detections[0].FunctionFQN) + assert.Equal(t, 5, detections[0].SourceLine) + assert.Equal(t, 10, detections[0].SinkLine) + assert.Equal(t, "execute", detections[0].SinkCall) + assert.Equal(t, "local", detections[0].Scope) + assert.Equal(t, 0.7, detections[0].Confidence) + assert.False(t, detections[0].Sanitized) + }) + + t.Run("detects sanitizer between source and sink", func(t *testing.T) { + cg := callgraph.NewCallGraph() + cg.CallSites["test.safe"] = []callgraph.CallSite{ + { + Target: "request.GET", + Location: callgraph.Location{File: "test.py", Line: 5}, + }, + { + Target: "escape_sql", + Location: callgraph.Location{File: "test.py", Line: 8}, + }, + { + Target: "execute", + Location: callgraph.Location{File: "test.py", Line: 12}, + }, + } + + ir := &DataflowIR{ + Sources: []CallMatcherIR{{Patterns: []string{"request.GET"}}}, + Sinks: []CallMatcherIR{{Patterns: []string{"execute"}}}, + Sanitizers: []CallMatcherIR{{Patterns: []string{"escape_sql"}}}, + Scope: "local", + } + + executor := NewDataflowExecutor(ir, cg) + detections := executor.executeLocal() + + assert.Len(t, detections, 1) + assert.True(t, detections[0].Sanitized, "Should detect sanitizer between source and sink") + }) + + t.Run("detects sanitizer in reverse order (sink before source)", func(t *testing.T) { + cg := callgraph.NewCallGraph() + cg.CallSites["test.reverse"] = []callgraph.CallSite{ + { + Target: "execute", + Location: callgraph.Location{File: "test.py", Line: 5}, + }, + { + Target: "escape_sql", + Location: callgraph.Location{File: "test.py", Line: 8}, + }, + { + Target: "request.GET", + Location: callgraph.Location{File: "test.py", Line: 12}, + }, + } + + ir := &DataflowIR{ + Sources: []CallMatcherIR{{Patterns: []string{"request.GET"}}}, + Sinks: []CallMatcherIR{{Patterns: []string{"execute"}}}, + Sanitizers: []CallMatcherIR{{Patterns: []string{"escape_sql"}}}, + Scope: "local", + } + + executor := NewDataflowExecutor(ir, cg) + detections := executor.executeLocal() + + assert.Len(t, detections, 1) + assert.True(t, detections[0].Sanitized, "Should detect sanitizer even when sink appears before source") + }) + + t.Run("ignores cross-function flows in local scope", func(t *testing.T) { + cg := callgraph.NewCallGraph() + cg.CallSites["test.func1"] = []callgraph.CallSite{ + { + Target: "request.GET", + Location: callgraph.Location{File: "test.py", Line: 5}, + }, + } + cg.CallSites["test.func2"] = []callgraph.CallSite{ + { + Target: "eval", + Location: callgraph.Location{File: "test.py", Line: 15}, + }, + } + + ir := &DataflowIR{ + Sources: []CallMatcherIR{{Patterns: []string{"request.GET"}}}, + Sinks: []CallMatcherIR{{Patterns: []string{"eval"}}}, + Sanitizers: []CallMatcherIR{}, + Scope: "local", + } + + executor := NewDataflowExecutor(ir, cg) + detections := executor.executeLocal() + + assert.Empty(t, detections, "Local scope should not detect cross-function flows") + }) + + t.Run("handles multiple sources and sinks in same function", func(t *testing.T) { + cg := callgraph.NewCallGraph() + cg.CallSites["test.multi"] = []callgraph.CallSite{ + { + Target: "request.GET", + Location: callgraph.Location{File: "test.py", Line: 5}, + }, + { + Target: "request.POST", + Location: callgraph.Location{File: "test.py", Line: 7}, + }, + { + Target: "eval", + Location: callgraph.Location{File: "test.py", Line: 10}, + }, + { + Target: "execute", + Location: callgraph.Location{File: "test.py", Line: 15}, + }, + } + + ir := &DataflowIR{ + Sources: []CallMatcherIR{{Patterns: []string{"request.GET", "request.POST"}}}, + Sinks: []CallMatcherIR{{Patterns: []string{"eval", "execute"}}}, + Sanitizers: []CallMatcherIR{}, + Scope: "local", + } + + executor := NewDataflowExecutor(ir, cg) + detections := executor.executeLocal() + + // Should find 2 sources * 2 sinks = 4 combinations + assert.Len(t, detections, 4) + }) } func TestDataflowExecutor_Global(t *testing.T) { @@ -86,6 +244,52 @@ func TestDataflowExecutor_Global(t *testing.T) { assert.Contains(t, path, "test.process") }) + t.Run("executes global analysis and finds cross-function flows", func(t *testing.T) { + // Setup: Source in func A, sink in func B, A calls B + cg := callgraph.NewCallGraph() + cg.Edges = make(map[string][]string) + cg.Edges["test.source_func"] = []string{"test.sink_func"} + + cg.CallSites["test.source_func"] = []callgraph.CallSite{ + { + Target: "request.GET", + Location: callgraph.Location{Line: 10, File: "test.py"}, + }, + } + + cg.CallSites["test.sink_func"] = []callgraph.CallSite{ + { + Target: "eval", + Location: callgraph.Location{Line: 20, File: "test.py"}, + }, + } + + ir := &DataflowIR{ + Sources: []CallMatcherIR{{Patterns: []string{"request.GET"}}}, + Sinks: []CallMatcherIR{{Patterns: []string{"eval"}}}, + Sanitizers: []CallMatcherIR{}, + Scope: "global", + } + + executor := NewDataflowExecutor(ir, cg) + detections := executor.executeGlobal() + + // Should detect cross-function flow + assert.NotEmpty(t, detections) + found := false + for _, d := range detections { + if d.FunctionFQN == "test.source_func" && d.Scope == "global" { + found = true + assert.Equal(t, 10, d.SourceLine) + assert.Equal(t, 20, d.SinkLine) + assert.Equal(t, "eval", d.SinkCall) + assert.False(t, d.Sanitized) + assert.Equal(t, 0.8, d.Confidence) + } + } + assert.True(t, found, "Should find cross-function detection") + }) + t.Run("detects sanitizer on path", func(t *testing.T) { cg := callgraph.NewCallGraph() cg.Edges = make(map[string][]string) @@ -115,6 +319,53 @@ func TestDataflowExecutor_Global(t *testing.T) { hasSanitizer := executor.pathHasSanitizer(path, sanitizerCalls) assert.True(t, hasSanitizer) }) + + t.Run("excludes flows with sanitizer on path", func(t *testing.T) { + cg := callgraph.NewCallGraph() + cg.Edges = make(map[string][]string) + cg.Edges["test.source"] = []string{"test.sanitize"} + cg.Edges["test.sanitize"] = []string{"test.sink"} + + cg.CallSites["test.source"] = []callgraph.CallSite{ + { + Target: "request.POST", + Location: callgraph.Location{Line: 5, File: "test.py"}, + }, + } + + cg.CallSites["test.sanitize"] = []callgraph.CallSite{ + { + Target: "escape_html", + Location: callgraph.Location{Line: 10, File: "test.py"}, + }, + } + + cg.CallSites["test.sink"] = []callgraph.CallSite{ + { + Target: "render", + Location: callgraph.Location{Line: 15, File: "test.py"}, + }, + } + + ir := &DataflowIR{ + Sources: []CallMatcherIR{{Patterns: []string{"request.POST"}}}, + Sinks: []CallMatcherIR{{Patterns: []string{"render"}}}, + Sanitizers: []CallMatcherIR{{Patterns: []string{"escape_html"}}}, + Scope: "global", + } + + executor := NewDataflowExecutor(ir, cg) + detections := executor.executeGlobal() + + // Should NOT detect because sanitizer is on the path + globalDetections := []DataflowDetection{} + for _, d := range detections { + if d.Scope == "global" { + globalDetections = append(globalDetections, d) + } + } + assert.Empty(t, globalDetections, "Should not detect flows with sanitizer on path") + }) } func TestDataflowExecutor_PatternMatching(t *testing.T) { diff --git a/sourcecode-parser/dsl/ir_types_test.go b/sourcecode-parser/dsl/ir_types_test.go new file mode 100644 index 00000000..f7eb5ac5 --- /dev/null +++ b/sourcecode-parser/dsl/ir_types_test.go @@ -0,0 +1,129 @@ +package dsl + +import ( + "testing" + + "github.com/stretchr/testify/assert" +) + +func TestCallMatcherIR_GetType(t *testing.T) { + t.Run("returns correct IR type", func(t *testing.T) { + matcher := &CallMatcherIR{ + Type: "call_matcher", + Patterns: []string{"eval", "exec"}, + Wildcard: false, + MatchMode: "any", + } + + assert.Equal(t, IRTypeCallMatcher, matcher.GetType()) + }) + + t.Run("works with wildcard patterns", func(t *testing.T) { + matcher := &CallMatcherIR{ + Type: "call_matcher", + Patterns: []string{"request.*", "*.GET"}, + Wildcard: true, + MatchMode: "all", + } + + assert.Equal(t, IRTypeCallMatcher, matcher.GetType()) + }) +} + +func TestVariableMatcherIR_GetType(t *testing.T) { + t.Run("returns correct IR type", func(t *testing.T) { + matcher := &VariableMatcherIR{ + Type: "variable_matcher", + Pattern: "user_input", + Wildcard: false, + } + + assert.Equal(t, IRTypeVariableMatcher, matcher.GetType()) + }) + + t.Run("works with wildcard pattern", func(t *testing.T) { + matcher := &VariableMatcherIR{ + Type: "variable_matcher", + Pattern: "user_*", + Wildcard: true, + } + + assert.Equal(t, IRTypeVariableMatcher, matcher.GetType()) + }) +} + +func TestDataflowIR_GetType(t *testing.T) { + t.Run("returns correct IR type", func(t *testing.T) { + dataflow := &DataflowIR{ + Type: "dataflow", + Sources: []CallMatcherIR{ + {Type: "call_matcher", Patterns: []string{"request.GET"}}, + }, + Sinks: []CallMatcherIR{ + {Type: "call_matcher", Patterns: []string{"eval"}}, + }, + Sanitizers: []CallMatcherIR{ + {Type: "call_matcher", Patterns: []string{"escape"}}, + }, + Scope: "local", + } + + assert.Equal(t, IRTypeDataflow, dataflow.GetType()) + }) + + t.Run("works with global scope", func(t *testing.T) { + dataflow := &DataflowIR{ + Type: "dataflow", + Sources: []CallMatcherIR{ + {Type: "call_matcher", Patterns: []string{"input"}}, + }, + Sinks: []CallMatcherIR{ + {Type: "call_matcher", Patterns: []string{"execute"}}, + }, + Sanitizers: []CallMatcherIR{}, + Propagation: []PropagationIR{ + {Type: "assignment", Metadata: map[string]interface{}{"key": "value"}}, + }, + Scope: "global", + } + + assert.Equal(t, IRTypeDataflow, dataflow.GetType()) + }) +} + +func TestIRTypeConstants(t *testing.T) { + t.Run("IR type constants are defined correctly", func(t *testing.T) { + assert.Equal(t, IRType("call_matcher"), IRTypeCallMatcher) + assert.Equal(t, IRType("variable_matcher"), IRTypeVariableMatcher) + assert.Equal(t, IRType("dataflow"), IRTypeDataflow) + assert.Equal(t, IRType("logic_and"), IRTypeLogicAnd) + assert.Equal(t, IRType("logic_or"), IRTypeLogicOr) + assert.Equal(t, IRType("logic_not"), IRTypeLogicNot) + }) +} + +func TestMatcherIR_Interface(t *testing.T) { + t.Run("CallMatcherIR implements MatcherIR interface", func(t *testing.T) { + var matcher MatcherIR = &CallMatcherIR{ + Type: "call_matcher", + Patterns: []string{"test"}, + } + assert.Equal(t, IRTypeCallMatcher, matcher.GetType()) + }) + + t.Run("VariableMatcherIR implements MatcherIR interface", func(t *testing.T) { + var matcher MatcherIR = &VariableMatcherIR{ + Type: "variable_matcher", + Pattern: "test_var", + } + assert.Equal(t, IRTypeVariableMatcher, matcher.GetType()) + }) + + t.Run("DataflowIR implements MatcherIR interface", func(t *testing.T) { + var matcher MatcherIR = &DataflowIR{ + Type: "dataflow", + Scope: "local", + } + assert.Equal(t, IRTypeDataflow, matcher.GetType()) + }) +} diff --git a/sourcecode-parser/main_test.go b/sourcecode-parser/main_test.go index cdac8417..e33f140e 100644 --- a/sourcecode-parser/main_test.go +++ b/sourcecode-parser/main_test.go @@ -23,7 +23,7 @@ func TestExecute(t *testing.T) { { name: "Successful execution", mockExecuteErr: nil, - expectedOutput: "Code Pathfinder is designed for identifying vulnerabilities in source code.\n\nUsage:\n pathfinder [command]\n\nAvailable Commands:\n analyze Analyze source code for security vulnerabilities using call graph\n ci CI mode - Python DSL implementation in progress\n completion Generate the autocompletion script for the specified shell\n diagnose Validate intra-procedural taint analysis against LLM ground truth\n help Help about any command\n query Query mode - Python DSL implementation in progress\n resolution-report Generate a diagnostic report on call resolution statistics\n scan Scan mode - Python DSL implementation in progress\n version Print the version and commit information\n\nFlags:\n --disable-metrics Disable metrics collection\n -h, --help help for pathfinder\n --verbose Verbose output\n\nUse \"pathfinder [command] --help\" for more information about a command.\n", + expectedOutput: "Code Pathfinder is designed for identifying vulnerabilities in source code.\n\nUsage:\n pathfinder [command]\n\nAvailable Commands:\n analyze Analyze source code for security vulnerabilities using call graph\n ci CI mode with SARIF or JSON output for CI/CD integration\n completion Generate the autocompletion script for the specified shell\n diagnose Validate intra-procedural taint analysis against LLM ground truth\n help Help about any command\n query Query code using Python DSL rules\n resolution-report Generate a diagnostic report on call resolution statistics\n scan Scan code for security vulnerabilities using Python DSL rules\n version Print the version and commit information\n\nFlags:\n --disable-metrics Disable metrics collection\n -h, --help help for pathfinder\n --verbose Verbose output\n\nUse \"pathfinder [command] --help\" for more information about a command.\n", expectedExit: 0, }, }