Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Run Tests

on:
push:
branches: [ master ]
pull_request:
branches: [ master ]

jobs:
test:
runs-on: ubuntu-latest

strategy:
matrix:
node-version: [18.x, 20.x]

steps:
- uses: actions/checkout@v3

- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}

- name: Install dependencies
run: npm install

- name: Run article utility tests
run: node src/lib/articleUtils.test.js

- name: Run feed utility tests
run: node src/lib/feedUtils.test.js
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ aws-config.json
.env
dump.rdb
npm-debug.log
lib
/lib
103 changes: 103 additions & 0 deletions TESTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Testing Guide for Feed Processing Functions

This document describes the refactored, testable feed processing utilities and how to run tests.

## Overview

The feed processing logic has been extracted into small, testable functions in `src/lib/`:

- **`articleUtils.js`** - Pure functions for article hashing and scoring (no external dependencies)
- **`feedUtils.js`** - Utility functions for feed processing (headers, Redis keys, validation, etc.)

## Running Tests

```bash
node src/lib/feedUtils.test.js
```

All tests use the test data in `testdata/test-cases.json` which contains expected inputs and outputs generated from the actual Node.js implementation.

## Test Coverage

### Article Functions (`articleUtils.js`)

1. **`hash(article)`** - MD5 hash of article GUID
- Tests: 3 test cases verifying hash consistency
- Implementation matches `src/articles.js` exactly

2. **`score(article)`** - Unix timestamp score
- Tests: 3 test cases with different date field names (pubDate, pubdate, date)
- Implementation matches `src/articles.js` exactly

### Feed Functions (`feedUtils.js`)

1. **`buildRequestHeaders(storedFeed)`** - Builds HTTP headers for conditional GET
- Tests: 4 test cases (no headers, If-Modified-Since, If-None-Match, both)

2. **`buildRedisKeys(feedURI)`** - Creates Redis key names
- Tests: 2 test cases with different feed URLs

3. **`buildArticleKey(hash)`** - Creates article key for Redis sorted set
- Tests: 1 test case verifying format

4. **`processArticle(article, feedURI, hashFn, scoreFn)`** - Adds computed fields
- Tests: 1 test case verifying hash, score, and feedurl are added

5. **`shouldStoreArticle(oldScore, newScore)`** - Determines if article needs S3 storage
- Tests: 4 test cases (new article, changed score, unchanged score, type coercion)

6. **`isValidArticle(article)`** - Validates article has required fields
- Tests: 4 test cases (valid, missing guid, missing description, null)

7. **`extractFeedMetadata(meta)`** - Extracts title and link from parser meta
- Tests: 1 test case

8. **`extractArticleIds(articleKeys)`** - Strips "article:" prefix from Redis keys
- Tests: 1 test case

## Test Data Format

The `testdata/test-cases.json` file contains test cases organized by function:

```json
{
"hash_function_tests": [...],
"score_function_tests": [...],
"request_headers_tests": [...],
...
}
```

Each test case has:
- `description` - Human-readable test description
- `input` - Input value(s) for the function
- `expected` - Expected output value

## Adding New Tests

1. Add test data to `testdata/test-cases.json`
2. Add corresponding test code in `src/lib/feedUtils.test.js`
3. Run tests to verify

## Future Work

Next steps:
1. Refactor `src/feeds.js` to use these utility functions
2. Add integration tests for Redis and S3 operations
3. Create Go implementation with matching behavior (in `feedfetcher/` directory)
4. Create Go tests that use the same `testdata/test-cases.json` file

## Why These Functions?

These functions were extracted because they are:
1. **Pure or nearly pure** - Deterministic output for given input
2. **Core business logic** - Critical for feed processing correctness
3. **Reusable** - Can be used by both Node.js and Go implementations
4. **Independently testable** - No mocking of Redis/S3 needed

The goal is to ensure both Node.js and Go implementations produce identical results for:
- Article hashing (critical for deduplication)
- Article scoring (critical for sorting)
- Request headers (critical for conditional GET optimization)
- Redis key naming (critical for data storage)
- S3 storage decisions (critical for performance)
3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
"eslint-config-airbnb": "^11.1.0",
"eslint-plugin-import": "^1.15.0",
"eslint-plugin-jsx-a11y": "^2.2.2",
"eslint-plugin-react": "^6.2.0"
"eslint-plugin-react": "^6.2.0",
"js-yaml": "^4.1.0"
}
}
15 changes: 5 additions & 10 deletions src/articles.js
Original file line number Diff line number Diff line change
@@ -1,16 +1,11 @@
import crypto from 'crypto';
import AWS from 'aws-sdk';
import labels from './labels';
// Import hash and score functions from testable utilities
import { hash as hashArticle, score as scoreArticle } from './lib/articleUtils.js';

export function hash(article) {
return crypto.createHash('md5').update(article.guid).digest('hex');
}

export function score(article) {
const articleDate = article.pubDate || article.pubdate || article.date;
const articleScore = Date.parse(articleDate) || Date.now();
return articleScore;
}
// Re-export for backward compatibility
export const hash = hashArticle;
export const score = scoreArticle;

function post(req, res) {
res.json({
Expand Down
39 changes: 21 additions & 18 deletions src/feeds.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,16 @@ import FeedParser from 'feedparser';
import request from 'request';
import AWS from 'aws-sdk';
import { hash, score } from './articles';
import {
buildRequestHeaders,
buildRedisKeys,
buildArticleKey,
processArticle,
shouldStoreArticle,
isValidArticle,
extractArticleIds,
generateArticleBody,
} from './lib/feedUtils.js';

const redisURL = process.env.REDIS_URL;
const redisClient = redis.createClient(redisURL);
Expand Down Expand Up @@ -83,7 +93,7 @@ function get(req, res) {
const feed = storedFeed;
feed.key = feedurl;
feeds.push(feed);
const articleIDs = articles.map(key => key.substr(8));
const articleIDs = extractArticleIds(articles);
if (feedurlPosition === feedurls.length - 1) {
res.json({
success: true,
Expand Down Expand Up @@ -111,17 +121,12 @@ const feed = {
const params = { Bucket: 'feedreader2018-articles' };
const s3 = new AWS.S3({ params });
const feedURI = decodeURIComponent(req.url.slice(10));
const feedKey = `feed:${feedURI}`;
const articlesKey = `articles:${feedURI}`;
const { feedKey, articlesKey } = buildRedisKeys(feedURI);

redisClient.hgetall(feedKey, (e, storedFeed) => {
let fetchedFeed = {};
if ((!e) && storedFeed) fetchedFeed = storedFeed;
const headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36',
};
if (fetchedFeed.lastModified) headers['If-Modified-Since'] = fetchedFeed.lastModified;
if (fetchedFeed.etag) headers['If-None-Match'] = fetchedFeed.etag;
const headers = buildRequestHeaders(fetchedFeed);

const requ = request({
uri: feedURI,
Expand Down Expand Up @@ -187,16 +192,14 @@ const feed = {
const stream = this;
for (;;) {
const article = stream.read();
if (!article || !article.guid || !article.description) {
if (!isValidArticle(article)) {
return;
}
article.hash = hash(article);
article.score = score(article);
article.feedurl = feedURI;

const key = article.hash;
const rank = article.score;
const articleKey = `article:${key}`;
const processedArticle = processArticle(article, feedURI, hash, score);
const key = processedArticle.hash;
const rank = processedArticle.score;
const articleKey = buildArticleKey(key);

redisClient.zscore(articlesKey, articleKey, (zscoreErr, oldscore) => {
if (zscoreErr) {
Expand All @@ -211,9 +214,9 @@ const feed = {
articleAddErr.type = 'Redis Error';
articleAddErr.log = zaddErr.message;
stream.emit('error', articleAddErr);
} else if ((oldscore === null) || (rank !== parseInt(oldscore))) {
} else if (shouldStoreArticle(oldscore, rank)) {
// Only stringify when we actually need to store it
const body = JSON.stringify(article);
const body = generateArticleBody(processedArticle);
s3.putObject({
Key: key,
Body: body,
Expand Down Expand Up @@ -245,7 +248,7 @@ const feed = {
});
} else {
fetchedFeed.success = true;
fetchedFeed.articles = allArticles.map(key => key.substr(8));
fetchedFeed.articles = extractArticleIds(allArticles);
res.json(fetchedFeed);
}
});
Expand Down
28 changes: 28 additions & 0 deletions src/lib/articleUtils.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// Pure utility functions for article processing (no external dependencies)
// These can be tested without AWS or Redis

const crypto = require('crypto');

/**
* Generates MD5 hash of article GUID
* Reference: api/src/articles.js hash() function
* @param {Object} article - Article object with guid field
* @returns {string} MD5 hash in hex format
*/
function hash(article) {
return crypto.createHash('md5').update(article.guid).digest('hex');
}

/**
* Generates score (timestamp) for article
* Reference: api/src/articles.js score() function
* @param {Object} article - Article object with date fields
* @returns {number} Unix timestamp in milliseconds
*/
function score(article) {
const articleDate = article.pubDate || article.pubdate || article.date;
const articleScore = Date.parse(articleDate) || Date.now();
return articleScore;
}

module.exports = { hash, score };
63 changes: 63 additions & 0 deletions src/lib/articleUtils.test.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
// Tests for article utility functions (hash and score)
// Run with: node src/lib/articleUtils.test.js

const { hash, score } = require('./articleUtils.js');
const fs = require('fs');
const yaml = require('js-yaml');
const assert = require('assert');

// Load test cases from YAML
const testCasesYaml = fs.readFileSync('./testdata/test-cases.yaml', 'utf8');
const testCases = yaml.load(testCasesYaml);

// Simple test runner
let passed = 0;
let failed = 0;

function test(name, fn) {
try {
fn();
passed++;
console.log(`✓ ${name}`);
} catch (error) {
failed++;
console.error(`✗ ${name}`);
console.error(` ${error.message}`);
}
}

// Run all tests
console.log('\n=== Testing Article Utility Functions ===\n');

// Test hash function
testCases.hash_function_tests.forEach((testCase) => {
test(testCase.description, () => {
const result = hash(testCase.input);
assert.strictEqual(result, testCase.expected,
`Hash mismatch: got ${result}, expected ${testCase.expected}`);
});
});

// Test score function
testCases.score_function_tests.forEach((testCase) => {
test(testCase.description, () => {
const result = score(testCase.input);
if (testCase.expected_type === 'timestamp') {
// For invalid dates that fallback to Date.now(), just check it's a number
assert.strictEqual(typeof result, 'number',
`Score should be a number: got ${typeof result}`);
assert.ok(result > 0, `Score should be positive: got ${result}`);
} else {
assert.strictEqual(result, testCase.expected,
`Score mismatch: got ${result}, expected ${testCase.expected}`);
}
});
});

// Print summary
console.log(`\n=== Test Summary ===`);
console.log(`Passed: ${passed}`);
console.log(`Failed: ${failed}`);
console.log(`Total: ${passed + failed}\n`);

process.exit(failed > 0 ? 1 : 0);
Loading