Skip to content

This repository contains a project completed for GMIT's Theory of Algorithms module. It involves writing a program in the C programming language to perform the Secure Hash Algorithm (SHA), specifically the 256-bit version known as SHA-256. The implementation is a single C file thats calculates the SHA-256 checksum of an input.

Notifications You must be signed in to change notification settings

kbarry91/4th-year-SHA-256-Algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

4th-year-SHA-256-Algorithm

This repository contains a project completed for GMIT's Theory of Algorithms module. It involves writing a program in the C programming language to perform the Secure Hash Algorithm (SHA) algorithm, specifically the 256-bit version known as SHA-256. The implementation is a single C file thats calculates the SHA-256 checksum of an input. The algorithm is based on the Secure Hash Standard document supplied by the National Institute of Standard and Technology.


About the SHA-256 standard

This standard specifies hash algorithms that can be used to generate digests of messages. The digests are used to detect whether messages have been changed since the digests were generated. The table below presents the basic properties of the algorithm.

Algorithm Message Size (bits) Block Size (bits) Word Size (bits) Message Digest Size (bits)
SHA-256 <264 512 42 256

Preprocessing

Before SHA-256 can be performed on an input some preproccesing is required. The steps are outlined in Section 5 of the Processing Standards Publication 180-4.

  1. Padding the message (Section 5.1).
  2. Parse message into message blocks (Section 5.2).
  3. Set initial hash value (Section 5.3).

Padding the message (To become a 512 bit padded message block)

  • Append "1" bit to the end.
  • Add enough "0" bits so left with 64 bits at the end.
  • In remaining 64 encode length(nobits) of message in binary big-endian.

Steps

  • Read 64bytes at a time from file to our message block.
  • If last fread < 56 bytes put all padding into last message block (add a "1" bit and 7 "0" bits).
  • If we dont have enough bytes left at the end of the block:
    • Create new message block.
    • Only contains padding.
  • If file was exactly 512 bits:
    • Create another message block first bit is "1"
    • Then "0" bits
    • Then last 64 bits are the number of bits in original file.

SHA-256 Hash Computation

Each Message block M(1), M(2),.... M(N) is processed in order using the steps defined below:

  1. Prepare message schedule W(t)
  2. Iniitialze the eight working variables a, b, c, d, e, f, g, h with their specified hash value
  3. For t=0 to 63 create new values for working variables.
  4. Compute the ith intermediate hash value H(i):

After repeating steps one to four N times the resulting 256-bit message digest of M is H(0)(N), H(1)(N),......H(7)(N).


Prerequisites

The only requirement for this program is a C compiler. There is 2 ways to do so

  1. Install a C compiler on the specified machine.
  2. Use the online service available at onlinegdb to complie C file.

Running the program

Download

Clone this reposiory to your machine.

  • Navigate to directory
  • In command prompt
	> git clone https://github.com/kbarry91/4th-year-SHA-256-Algorithm.git

Compile the program

Navigate to the downloaded repository and enter :

	> gcc -o sha-256 sha-256.c

This will compile the program and add a sha-256 executable to the directory.

Execute the program

The program has been designed to work in 3 different ways:

  1. Enter file as a command line arguement.
  2. Enter the filename as a string at runtime.
  3. Enter a string to be hashed at runtime.

Command line arguemnt

To hash a file from command line enter the executable and the file to be hashed.

> ./sha256 filename.txt

Runtime (File input)

The algorithm has be designed to check if a file was entered as an arguement. If not you will be given the option to select 1 and enter the file name. Simply enter the path and filename. To run the program:

> ./sha256 

Runtime (String Input)

The program allows for a user to enter a string to compute the checksum. Simply select option 2 at the main menu and enter the string. To run the program:

> ./sha256 


Testing

The algorithm was tested using the test vectors approved by the National Institute Of Standards available at DI Management.Testing was verified on both Linux and Windows machines and returned the same results. In order to verify the results each checksum was compared with the results got from 2 other resources.

  1. sha256_checksum.
  2. onlinemd5.

To run the following tests the corresponding test files have been added to the test-files folder. When testing via command line arguement or at runtime the files can be referenced by test-files/test1.txt

Test 1

Input Expected Result Actual Result PASS/FAIL
abc ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad PASS

Test 2

Input Expected Result Actual Result PASS/FAIL
empty string "" e3b0c442 98fc1c14 9afbf4c8 996fb924 27ae41e4 649b934c a495991b 7852b855 e3b0c442 98fc1c14 9afbf4c8 996fb924 27ae41e4 649b934c a495991b 7852b855 PASS

Test 3

Input Expected Result Actual Result PASS/FAIL
abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq 248d6a61 d20638b8 e5c02693 0c3e6039 a33ce459 64ff2167 f6ecedd4 19db06c1 248d6a61 d20638b8 e5c02693 0c3e6039 a33ce459 64ff2167 f6ecedd4 19db06c1 PASS

Test 4

Input Expected Result Actual Result PASS/FAIL
testing on file sha256.c d10c44abceaca287a2fae35c70436b4ba30286baaac0ad044e3fcf9612799fe8 f8b8ff1646d46d6cebf61110ccde701b0f076904d2dcf3a1177cd6c6d86f561c FAIL see known bugs section

Features of the implementation

File Checking

A file must be succesfully opened in order to run the program. This is simply achieved by using fopen() and a check for NULL. If a file isn't provided as an argument the user can enter a file name and then that file will be checked.

	FILE *file;

	// Check if file was entered as cmd argument.
	if (argv[1] == NULL)
	{
		printf("No file specified as argument.\nPlease enter a file name: ");
		scanf("%s", fileName);
		printf("Searching for %s.....\n",fileName);
		
		file = fopen(fileName, "r");
	}else{
		file = fopen(argv[1], "r");
	}

	// Check if file opened succesfully.
	if (file == NULL)
	{
		printf("[ERROR]: Could not open file.\n");
	}
	else
	{
		// Run Secure Hash Algorithim on the file.
		printf("[FILE READ SUCCESS]: Now running sha256 Hash Computation.....\n");
		sha256(file);
	}

File Writing

After successfully processing the SHA-256 of an input the checksum is saved to a new file in a folder called saved-hashes. As the file entered may contain a file path and an extension, using the libary <libgen.h> the path and extension are removed from the file. The new file name is appended with .txt and then appended to the path saved-hashes/ to save the file.

User Input

The program allows the user to enter a string to generate its checksum. This is done by saving the string to a file at test-files/userinput.txt". Once saved the sha 256 is calculated as normal.

Endian Check

Little and big endian are two ways of storing multibyte data-types ( int, float, etc). In little endian machines, last byte of binary representation of the multibyte data-type is stored first. On the other hand, in big endian machines, first byte of binary representation of the multibyte data-type is stored first.

To check if a machine uses big-endian or little-endian the following macro was used :

	#define IS_BIG_ENDIAN (*(uint16_t *)"\0\xff" < 0x100)

The above code illustrates the comparison between an integer being compared against a cast character string integer.

Convert little-endian to big-endian

	#define CONVERT_UINT32(x) (((x) >> 24) | (((x)&0x00FF0000) >> 8) | (((x)&0x0000FF00) << 8) | ((x) << 24))

Convert big-endian to little-endian

	#define CONVERT_UINT64(x) 
	((((x) >> 56) & 0x00000000000000FF) | (((x) >> 40) & 0x000000000000FF00) | 
	 (((x) >> 24) & 0x0000000000FF0000) | (((x) >> 8 ) & 0x00000000FF000000) |  
	 (((x) << 8 ) & 0x000000FF00000000) | (((x) << 24) & 0x0000FF0000000000) |  
	 (((x) << 40) & 0x00FF000000000000) | (((x) << 56) & 0xFF00000000000000))

Limitations , known Bugs and Improvements

Limitations

The program appears to be invalid when tested on large files. After extensive research, I have narrowed the fault down to an issue with how line breaks are used to terminate lines in different operating systems such as a PC running Windows and a web server running Linux.

  • Windows - Uses CR and LF characters to terminate lines.
  • UNIX - Uses only a single LF character.
  • MAC - Uses a single CR character.

This can also be caused by the text editor used to create the file, with some editors like Notepad++ the file is adjusted to suit the current operating system. The solution below suggested by hanselman has been implemented in .NET to deal with the issue when writing a file.

  public static String NewLine {
    get {
        Contract.Ensures(Contract.Result() != null);
  	#if !PLATFORM_UNIX
          return "\r\n";
        #else
          return "\n";
        #endif // !PLATFORM_UNIX
    }
  }

To solve to this issue when reading from a file byte by byte I would check for a line ending and remove it, for example on Windows remove the 0D byte so the newline is only \n. Due to lack of time to extensively test the solution I did not get to implement the change, but the research involved to solve the issue will benefit to help me deal with this problem before it arises in future projects.

Improvements

I feel that the code design could be improved to produce a much cleaner file. The main reason for this is extra features that I decided to implement at last minute. I have started practicing developing with an agile approach to get the products business requirements delivered at a faster pace and if more time was available I would conduct this code clean up on my next sprint.

To outline an example the code uses 2 methods for file reading. These methods could be redesigned to abstract what differs to allow one method to be used for both types of input and output.

	int writeToFile(uint32_t hash[])
	int writeToFileInput(char inputString[])

Although it works as it should I would also (given more time) propose to extract the menu operations from the main method, just to pretify the code.


References

In order to complete this project alot of research went into both the SHA-256 algorithm aswell as the C programming language. Any code adapted from outside sources is refernced in the sha256.c file and complies with all licenses and [^policies] .

Below is as list of some other resources used to conduct research:

[^policies] : This project complies with the Quality Assurance Framework at GMIT which includes the Code of Student Conduct and the Policy on Plagiarism.

About

This repository contains a project completed for GMIT's Theory of Algorithms module. It involves writing a program in the C programming language to perform the Secure Hash Algorithm (SHA), specifically the 256-bit version known as SHA-256. The implementation is a single C file thats calculates the SHA-256 checksum of an input.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages